The Double Readiness Gap in Machine Learning for Building Energy Management: A Scoping Review of Deployment Maturity, Trustworthy AI, and EU AI Act Alignment

Malvoni, Maria

doi:10.3390/su18126107

Open AccessReview

The Double Readiness Gap in Machine Learning for Building Energy Management: A Scoping Review of Deployment Maturity, Trustworthy AI, and EU AI Act Alignment

by

Maria Malvoni

Department of Energy Efficiency, Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), Centro Ricerche Brindisi, Cittadella della Ricerca, SS7 km 706, 72100 Brindisi, Italy

Sustainability 2026, 18(12), 6107; https://doi.org/10.3390/su18126107 (registering DOI)

Submission received: 29 April 2026 / Revised: 5 June 2026 / Accepted: 11 June 2026 / Published: 14 June 2026

(This article belongs to the Special Issue AI-Driven Multi-Technology Renewable Energy Systems: Climate Resilience and Sustainable Management)

Download

Browse Figures

Versions Notes

Abstract

Reducing building energy consumption is central to EU climate-neutrality targets and to sustainable development goals: buildings account for around 40% of EU final energy consumption, placing Building Energy Management Systems (BEMS) at the intersection of the European Green Deal and the EU Artificial Intelligence Act. A scoping review following PRISMA-ScR guidelines charted 61 Machine Learning (ML) for BEMS papers (2020–2026) across three sub-domains (load forecasting and energy monitoring, HVAC control, and demand response), using a nine-point Technology Readiness Level (TRL) rubric and three Trustworthy AI (TAI) dimensions (Privacy & Data Governance, Robustness, and Transparency). The review finds that 90.2% of papers remain at the development stage (TRL 4–6), with no multi-site production deployment documented. TAI coverage is heterogeneous at publication level: transparency is addressed in only 3 of 61 papers (4.9%), and privacy provisions (the best-covered ALTAI dimension) are concentrated in demand-response papers (9 of 17, 52.9%), largely via Federated Learning (6 of 9 privacy-tagged papers). A three-level EU AI Act risk classification identifies 23 borderline-candidacy papers (37.7%), predominantly Reinforcement Learning-based HVAC control systems, whose high-risk proximity cannot be resolved at abstract level; explicit compliance engagement is absent from all 61 mapped sources, including the 22 papers published after the Act entered into force in August 2024. The findings document adouble readiness gap: a TRL ceiling co-located with limited documented engagement with TAI obligations and EU AI Act compliance at publication level. Closing this gap is necessary before AI-driven building energy management can be deployed at scale under EU governance requirements.

Keywords:

building energy management systems; building automation; building control; energy management; EU AI Act; technology readiness level; HVAC; demand response; federated learning; trustworthy AI

1. Introduction

Europe’s building stock accounts for roughly 40% of final energy consumption in the European Union, a share large enough to place buildings at the centre of the European Green Deal and the revised Energy Performance of Buildings Directive (EPBD 2024) [1,2]. The Directive identifies Building Energy Management Systems (BEMS), software platforms that monitor, control, and optimise building energy flows in real time, as a key instrument for demand reduction and renewable integration. Over the past decade, machine learning (ML) has established itself as the leading computational framework for BEMS optimisation across three functionally distinct sub-domains: short-term load forecasting and energy monitoring; adaptive HVAC setpoint control and building optimisation; and grid demand response and flexibility [3].

Yet whether and to what extent this transition has been achieved in the BEMS ML literature has not been systematically assessed using a standardised readiness framework [4].

This deployment gap is directly relevant to the EU Artificial Intelligence Act (Regulation 2024/1689) [5], in force since August 2024, which establishes risk-based compliance obligations for high-risk AI applications, including ML-enabled BEMS functions that may encompass systems acting as safety components in critical energy infrastructure and demand-response platforms processing live individual metering data. Systems validated only in simulation or single-building pilots have not yet been systematically assessed against these obligations.

Prior reviews have addressed each sub-domain individually without a shared cross-domain readiness framework: for load forecasting, prediction models, XAI applications, and data barriers [6,7,8,9,10]; for HVAC control, AI-assisted strategies, hybrid control, and reinforcement learning [11,12,13,14,15]; and for demand response, demand forecasting and energy flexibility [16,17].

Within this broader context, three dimensions remain insufficiently addressed in the literature reviewed here: (i) the quantification of deployment maturity using a standardised Technology Readiness Level rubric; (ii) the mapping of TAI compliance across ALTAI dimensions of privacy, robustness, and transparency; and (iii) the evaluation of alignment with the EU AI Act risk-classification framework.

This study addresses these gaps through a scoping review following PRISMA 2020 guidelines, covering the three BEMS sub-domains outlined above and structured around three research questions (RQs):

RQ1—Deployment Maturity: What is the distribution of Technology Readiness Levels across the three BEMS sub-domains, and what deployment-context patterns are associated with different maturity levels?
RQ2—Trustworthy AI: What is the distribution of TAI coverage across the three BEMS sub-domains, and how consistently are the ALTAI dimensions of privacy, robustness, and transparency represented at publication level?
RQ3—Regulatory Readiness: What is the distribution of EU AI Act risk proximity across the three BEMS sub-domains, and to what extent do the included papers document engagement with Annex III classification criteria at publication level?

Section 2 describes the review protocol. Section 3 reports sub-domain findings. Section 4 presents cross-domain patterns, and Section 5 introduces the Deployment Readiness Map. Section 6 addresses RQ1–RQ3 and study limitations, and Section 7 closes the paper.

2. Materials and Methods

2.1. Review Protocol and Eligibility

This scoping review is designed for comparative cross-domain charting rather than prevalence estimation over the full screened population.

Item reporting follows the PRISMA Extension for Scoping Reviews (PRISMA-ScR) [18]; the PRISMA 2020 flow diagram structure [19] is adopted solely for record-flow reporting.

The review protocol is structured around three BEMS sub-domains: (A) Load Forecasting and Energy Monitoring, (B) HVAC Control and Building Optimisation, and (C) Demand Response and Flexibility.

Eligibility criteria follow the Population–Concept–Context (PCC) framework recommended for scoping reviews [18]: Population = buildings; Concept = ML-based BEMS across the three sub-domains; Context = peer-reviewed literature published between 2020 and 2026 and indexed in Scopus or IEEE Xplore.

Sources are eligible if they: (i) propose, evaluate, or review an ML-based system in at least one of the three BEMS sub-domains; (ii) are published in English in a peer-reviewed journal or conference proceedings indexed in Scopus or IEEE Xplore; and (iii) appear between 2020 and 2026 inclusive.

Sources are excluded if they address wind, solar, or grid-only applications without an explicit building-level BEMS component.

2.2. Information Sources and Study Selection

A structured search was conducted across two primary databases: Scopus and IEEE Xplore. Six independent queries, one per database and sub-domain combination, were formulated using title-field constraints to maximise precision.

Table A1 (Appendix A) reports the query strings. The search was last executed in March 2026.

Records were deduplicated and screened using a deterministic pipeline combining exact DOI matching, fuzzy title matching (token-sort ratio, threshold 88; thefuzz v ≥ 0.20 [20]), and rule-based relevance scoring (score cutoff

< 0.25

).

A stratified sampling design was adopted, targeting approximately 10% of 614 screened records. Records were stratified by sub-domain (A: 285, B: 192, C: 137), each subject to a minimum floor of 12 papers, ranked by composite screening score, and allocated using a Hamilton largest-remainder quota to minimise rounding error and avoid under-representation of the smallest sub-domain (C,

n_{screened} = 137

).

This procedure yielded a final corpus of

N = 61

papers: sub-domain A (load forecasting and energy monitoring,

n = 20

), sub-domain B (HVAC control and building optimisation,

n = 24

), and sub-domain C (demand response and flexibility,

n = 17

). The PRISMA flow diagram is presented in Figure 1.

2.3. Data Charting Framework

For each included source of evidence, the following items were charted: primary BEMS sub-domain, ML technique, TRL, TAI coverage, and key performance indicators (KPIs). The ML technique is coded according to the main algorithmic approach (e.g., LSTM, DQN, federated learning).

Papers using Model Predictive Control (MPC) or Mixed-Integer Linear Programming (MILP) without an embedded ML component are retained in the corpus as non-ML comparators; they are excluded from ML taxonomy counts and sub-domain ML breakdowns, but retained as a reference group in the cross-category TRL distribution analysis (Section 4.2).

TRL is assigned using the rubric detailed in Section 2.4; TAI coverage is coded as present or absent for each of the three ALTAI dimensions; and KPIs are charted as reported in the abstract. For sub-domain synthesis, additional items include deployment context (simulation/offline/pilot/production) and dataset type (public benchmark/simulation/smart meter/aggregate), which are used for cross-domain analyses.

The primary data charting protocol is based on title and abstract only, consistent with the need for cross-domain comparability and the fact that deployment maturity and TAI coverage are often signalled at publication level through the abstract. Findings are therefore interpreted as properties of the mapped abstracts rather than as claims about the underlying deployed systems, and reported percentages refer to this analytic sample rather than to prevalence across all screened records.

In all ambiguous cases, a conservative lower-bound rule is applied: a TRL level or TAI dimension is assigned as present only when the abstract provides an explicit positive signal. Absence of a signal is recorded as absence of evidence at publication level, not as evidence of absence in the underlying system.

This protocol follows an established precedent in PRISMA-ScR scoping reviews, where cross-domain comparability requirements justify abstract-level coding [18], and deliberately yields conservative lower-bound estimates of TAI coverage, reducing the risk of false positives in compliance gap identification. Following PRISMA-ScR Item 12, the TRL rubric is also the critical appraisal instrument.

A secondary validation was applied to a targeted subset of 12 deployment-adjacent papers to estimate the conservative bias introduced by abstract-only coding; this validation informs the robustness discussion in Section 6.4 and does not alter the primary charting results.

2.4. Technology Readiness Level Rubric

TRLs are assigned using a nine-point rubric (Table 1) adapted from EU/Horizon Europe guidelines [21], applying the lower-bound rule defined in Section 2.3.

Three domain-specific adaptations are introduced: (i) TRL 1–2 are collapsed into a single Research band, since abstract-only signals do not distinguish between basic-principles and concept-formulation stages; (ii) TRL 4 and TRL 5 are distinguished by evaluation environment (public benchmark for TRL 4; building physics simulation for TRL 5); and (iii) TRL 6 maps to offline evaluation on real building data, consistent with the EU “demonstrated in relevant environment” criterion for ML/software-intensive systems [4]. Rule-based classifier signals for each level are detailed in Table 1.

For cross-domain synthesis, paper-level TRL assignments are additionally aggregated into three deployment bands: Research (TRL 1–3), Development (TRL 4–6), and Demonstration (TRL 7–9).

2.5. Trustworthy AI (TAI) Assessment

Each paper is evaluated against three dimensions of the Assessment List for Trustworthy AI (ALTAI) [22]: (i) Privacy & Data Governance, (ii) Robustness, and (iii) Transparency.

For each dimension, coverage is coded at paper level as present or absent on the basis of title and abstract only, consistent with the conservative lower-bound rule (Section 2.3). The BEMS-ML-specific operationalisation and abstract-level classifying signals are detailed in Table 2.

For each paper, the three binary indicators are aggregated into a four-level ALTAI coverage class. The four classes are: No coverage (0 dimensions), Partial coverage (1 dimension), Multiple coverage (2 dimensions), and Full coverage (3 dimensions). In cross-domain visualisation (Section 5), this variable is the vertical axis of the Deployment Readiness Map.

TAI coverage coding is applied to all 61 included papers regardless of ML/non-ML status, as ALTAI dimensions, particularly transparency and robustness, are relevant to any automated decision system deployed in a building context. Non-ML comparators are therefore included in sub-domain TAI counts (Section 4.3).

2.6. EU AI Act Risk Classification

This EU AI Act risk classification is used solely as an analytical tool within the review and does not constitute formal legal advice. Each paper is screened against Article 6 and Annex III of Regulation 2024/1689 [5].

A BEMS system is treated as potentially high-risk when its ML function may qualify as a safety component in critical energy infrastructure (e.g., electricity, gas, or heat). By contrast, supportive or advisory algorithms that do not perform a safety function, and whose failure would not directly endanger infrastructure operation, are treated as non-high-risk. Cases that appear to fall under Annex III(2) [5] are assigned to a conservative high-risk candidacy category.

Systems evaluated only in simulation (TRL < 6) are not confirmed as high-risk, because abstract-level evidence is insufficient for a formal rebuttal analysis under Article 6(3). The review therefore provides a conservative screening of regulatory proximity rather than a full legal compliance assessment.

A three-level analytical scheme is applied consistently across the 61 papers.

High-risk candidacy applies to systems that (i) control HVAC equipment with autonomous setpoint authority in a confirmed critical-infrastructure deployment context and are validated at TRL ≥ 6, or (ii) issue automated demand-response or load-curtailment signals with grid-stability operational effects in a confirmed critical-infrastructure demand-response context.

Borderline candidacy applies when the abstract reports autonomous HVAC setpoint authority or grid-stability demand-response signals (frequency regulation, binding curtailment) but does not confirm or exclude a critical-infrastructure deployment context. In these cases, the assignment records unresolved ambiguity rather than a positive Annex III identification.

Minimal risk covers all remaining papers for which no plausible Annex III trigger is identifiable at abstract level. Risk tier depends primarily on deployment context rather than building typology: the same technical configuration may fall into different categories depending on whether critical-infrastructure qualification is established.

Non-ML comparators are excluded from EU AI Act risk-tier classification (Section 4.4), since Regulation 2024/1689 Art. 3(1) [5] does not classify deterministic optimisation algorithms as AI systems unless embedded within a learning pipeline.

3. Sub-Domain Findings

3.1. Sub-Domain A: Load Forecasting and Energy Monitoring

Sub-domain A covers

n = 20

papers on short-term load forecasting, non-intrusive load monitoring (NILM), energy disaggregation, and anomaly detection on smart-meter data. Table 3 shows methodological convergence around sequence-learning approaches, with LSTM-family and Transformer architectures covering 15 of 20 entries.

The TRL distribution is concentrated in the development band (TRL 4:

n = 10

; TRL 6:

n = 10

), with no pilot or production entry.

TAI coverage is uneven at abstract level: Privacy & Data Governance appears in 4 of 20 papers, robustness in 7 of 20, and transparency in 1 of 20 [36].

Sub-domain patterns are synthesised in Section 4 and interpreted under RQ2 in Section 6.

3.2. Sub-Domain B: HVAC Control and Building Optimisation

Sub-domain B covers

n = 24

papers (Table 4). RL-centric controller design structures most of the sub-domain, while non-RL approaches remain ancillary.

The TRL distribution spans TRL 2–7: 20 papers fall in the development band (TRL 4–5), 3 in the research band (TRL 1–3), and 1 at TRL 7.

TAI signals remain sparse at abstract level: robustness appears in 5 of 24 papers, Privacy & Data Governance in 0 of 24, and transparency in 2 of 24.

The cross-domain significance of this profile is assessed in Section 4.1, Section 4.2 and Section 4.3 and discussed under RQ1–RQ3 in Section 6.

3.3. Sub-Domain C: Demand Response and Flexibility

Sub-domain C includes

n = 17

papers and Table 5 shows a domain where coordination logic and data-governance concerns are intrinsically coupled. The dominant methodological pattern is architecture-led rather than algorithm-led: federated configurations are the organizing design principle across otherwise heterogeneous modeling choices.

The TRL profile is split between research and development bands: 2 papers are coded at TRL 1–3 and 15 at TRL 4–6, with no pilot entry.

TAI coverage is asymmetric at abstract level: Privacy & Data Governance appears in 9 of 17 papers, whereas robustness and transparency are both 0 of 17.

The structural asymmetry between Privacy and the remaining dimensions is analysed in Section 4.3 and under RQ2–RQ3 in Section 6.

4. Cross-Domain Analysis

4.1. ML Taxonomy

The corpus distributes across ML categories as shown in Figure 2: supervised learning accounts for 23 papers (37.7%), reinforcement learning for 26 (42.6%), federated learning for 10 (16.4%), and model predictive control (non-ML) for 2 (3.3%).

Three taxonomic notes guide interpretation. First, FL is treated as a training architecture orthogonal to learning paradigm: conflating the two would obscure the distinction between optimisation logic and training topology. In Figure 2, FL is nonetheless shown as a discrete bar for descriptive comparability and should be read accordingly. Second, the single physics-informed hybrid paper spans both axes and is counted once on each independently. Third, MPC is treated as a non-ML baseline: it is excluded from ML taxonomy counts and sub-domain breakdowns, consistent with the definitional boundary of Regulation 2024/1689 Art. 3(1) [5], which does not classify deterministic optimisation algorithms as AI systems unless embedded within a learning pipeline.

Figure 2 breaks down ML technique categories across the three BEMS sub-domains, with bars showing the percentage of papers in each sub-domain assigned to each ML category and absolute counts (n) on the bars. Supervised learning encompasses LSTM-based, Transformer, and other supervised architectures. Consistent with the architectural distinction above, FL is displayed as a separate category and excluded from supervised counts, while MPC (non-ML) is included as a comparator baseline in Figure 2 and in the TRL distribution analysis (Section 4.2).

Load forecasting and energy monitoring (sub-domain A) is dominated by supervised learning, primarily LSTM-based and Transformer architectures (

n = 15

, 75%), with a notable FL component (

n = 4

, 20%) driven by privacy-preserving NILM systems, and one RL paper (

n = 1

, 5%). HVAC control (sub-domain B), the most control-critical segment of the corpus given the autonomous setpoint authority of its dominant RL configurations, is overwhelmingly RL-driven (

n = 20

, 83%), with one MPC paper retained as a non-ML baseline, three supervised papers (

n = 3

, 13%), and no FL adoption. Demand response and flexibility (sub-domain C) presents the most balanced profile: supervised (

n = 5

, 29%), RL (

n = 5

, 29%), and FL (

n = 6

, 35%) are nearly co-equal, with one additional MPC baseline paper (

n = 1

, 6%), reflecting the structural requirement to train on distributed metering data without centralising raw traces.

4.2. TRL Landscape Across Sub-Domains

Table 6 disaggregates the cross-domain TRL counts by sub-domain. Across the three sub-domains, the TRL distribution is skewed toward the development band. Five papers (8.2%) fall in the research stage (TRL 1–3), 55 (90.2%) in the development stage (TRL 4–6), and only one (1.6%) reaches the demonstration stage (TRL 7–9); no source documents a multi-site production deployment at TRL 8–9.

The aggregate picture masks marked sub-domain-level heterogeneity. Load forecasting shows the narrowest spread: all 20 papers fall within TRL 4–6, with a bimodal distribution (TRL 4:

n = 10

; TRL 6:

n = 10

). HVAC control covers the widest range (TRL 2–7): of 24 papers, 20 are at TRL 4–5, 3 at TRL 1–3, and 1 at TRL 7. Demand response presents a split profile, with 2 papers at TRL 1–3 and 15 at TRL 4–6, and no study advancing to pilot deployment.

The temporal dimension of this distribution is examined in Figure 3. No upward shift toward the Demonstration band is visible across any year of the 2020–2026 window. The apparent decrease in 2026 reflects partial corpus observability: records indexed after March 2026 are not represented in the sample. The interpretation of these cross-domain and temporal patterns is developed in Section 6.1.

Figure 4 compares median TRL across ML and training categories. MPC is retained as a reference group in the TRL distribution analysis. A Kruskal–Wallis test indicates significant differences (

H = 9.55

,

p = 0.023

): RL yields a median TRL of 4.0 (

n = 26

), non-FL supervised learning 4.0 (

n = 23

), MPC 6.5 (

n = 2

), and FL 6.0 (

n = 10

). The parity between RL and supervised learning reflects a shared Development-band ceiling: RL-based HVAC papers are constrained by simulation environments (EnergyPlus, Sinergym), while non-FL supervised approaches rely on public benchmarks (UK-DALE, REFIT), both without live-data validation. Among ML categories, FL yields the highest median (6.0), above both RL and supervised learning (both 4.0); MPC reaches 6.5 as a non-ML reference but with only

n = 2

papers this value is not interpretable as a distributional estimate. FL’s higher median relative to RL and supervised learning reflects a different constraint: because privacy-sensitive applications require training on real metering data, FL studies more often enter deployment contexts consistent with TRL 6. Overall, deployment context, more than algorithmic complexity, is associated with TRL advancement in the corpus.

4.3. TAI Coverage Summary

Figure 5 shows an asymmetric pattern across ALTAI dimensions and sub-domains. For comparison across dimensions and sub-domains, binary TAI indicators are aggregated within each sub-domain and recoded into three gap levels according to the number of papers addressing each ALTAI dimension: HIGH gap, MEDIUM gap and LOW gap.

The heatmap indicates that Privacy & Data Governance reaches the only LOW-gap result in demand response (9/17), robustness remains at MEDIUM gap in load forecasting (7/20) and HVAC control (5/24) while it is absent in demand response (0/17), and transparency is the weakest dimension overall with only 3 of 61 papers across the corpus.

4.4. EU AI Act Risk Distribution

Figure 6 shows the three-level EU AI Act distribution across the corpus. Applying the analytical scheme introduced in Section 2 across all 61 papers, the high-risk candidacy category is empty (no abstract confirms a critical-infrastructure deployment context sufficient to trigger Annex III obligations at abstract level), borderline candidacy captures 23 papers (37.7%; 20 RL-based HVAC control papers with autonomous setpoint authority and 3 demand-response papers issuing automated curtailment signals), and the remaining 38 papers (62.3%) fall in the minimal-risk category.

The same 23 borderline-candidacy papers are concentrated in HVAC control and demand response, while all 20 load-forecasting papers fall in the minimal-risk category.

5. Deployment Readiness Map

Figure 7 maps each of the 61 papers as a single marker in a categorical multivariate scatterplot combining five encoding channels: TRL level (x-axis), ALTAI coverage class (y-axis), sub-domain (marker shape), ALTAI dimension(s) addressed (fill/hatch pattern), and EU AI Act risk tier (marker colour: green = minimal risk, orange = borderline candidacy, red = high-risk candidacy). Row totals appear in the right margin.

Of the 61 papers, 36 (59.0%) cluster at TRL 4–6 with No coverage (zero ALTAI dimensions). One Research-band paper [66] reports both robustness and transparency (Multiple coverage), a profile not observed in Development-band studies.

Borderline-candidacy papers are concentrated in the Development band: 19 of 23 fall at TRL 4–6, and none reaches the Demonstration band. No paper falls into the high-risk candidacy category at abstract level. The single TRL 7 paper [43] remains minimal risk because its MPC-based architecture is outside the EU AI Act definition of an AI system (Art. 3(1)) unless embedded in an ML pipeline.

Along the vertical axis, TAI coverage does not increase with maturity: TRL 6 papers do not show stronger concentration in higher ALTAI classes than TRL 4 papers. The hatch patterns confirm this reading: transparency markers (horizontal fill) appear in only three papers across the entire scatterplot, none of them in the Development band above the Partial coverage row. The colour encoding similarly places most borderline-candidacy papers in the No coverage and Partial coverage rows of the Development band.

No paper occupies the upper-right region (Demonstration with Multiple or Full coverage), the target profile for deployment-ready and governance-documented BEMS systems. Taken together, TAI class and risk-tier colour make the double readiness gap immediately visible: limited documented TAI provisions coexist with a substantial borderline-candidacy cohort nearing the Annex III compliance horizon without publication-level evidence of preparation.

This multi-channel encoding enables direct cross-reading of deployment maturity, TAI coverage, and regulatory exposure, and provides a practical framework for future literature updates.

6. Discussion

All findings in this section refer to the stratified analytics sample of 61 papers extracted from 614 screened records; they should therefore be read as properties of the analytic corpus rather than as prevalence estimates for the entire screened literature.

6.1. RQ1—Deployment Maturity: The TRL Ceiling Problem

The TRL distribution reported inTable 6 directly addresses RQ1: with 90.2% of papers at the development stage and only one documented pilot at TRL 7, the corpus remains below verified production deployment. This development-band ceiling appears across sub-domains with different algorithmic profiles and evaluation traditions.

The field has consolidated effective development-stage practices, but the institutional and technical conditions needed for verified production deployment remain largely unmet. The primary barriers differ by sub-domain. For HVAC control, simulation-to-real transfer barriers help explain why 20 of 24 RL-based systems remain confined to the development band. For load forecasting and demand response, by contrast, the ceiling reflects the absence of standardised operational commissioning protocols and the difficulty of obtaining long-term live operational data, rather than algorithmic immaturity. Across all three sub-domains, the scarcity of open-source implementations limits independent TRL verification and may hinder the community-level coordination needed to move systems from offline evaluation to sustained deployment. These barriers are compounded by the TAI and regulatory gaps documented in previous sections: systems that do not document robustness under distribution shift (RQ2) or engage with EU AI Act obligations (RQ3) may face additional obstacles to deployment beyond TRL advancement alone.

Three mechanisms remain plausible contributors to this lack of temporal progression rather than demonstrated causes.

1.: Infrastructure lock-in: the absence of standardised multi-building testbeds with open live-inference APIs creates a structural ceiling at TRL 6 for all three sub-domains, because TRL 7 assignments require documented KPIs under live conditions.
2.: Simulation-to-real transfer barriers in RL: safe exploration, distribution shift, absent formal safety guarantees, and opaque reward design help explain why simulation-validated RL papers rarely advance to pilot deployment. This pattern is consistent with the domain-transfer fragility documented in RL-based physical control systems more broadly [15,64], with no RL-based HVAC paper in the charted corpus advancing beyond TRL 5 after 2020.
3.: Regulatory uncertainty as a deployment inhibitor: the EU AI Act timeline to August 2026–2027 [5], combined with the 23 borderline-candidacy papers in Section 4.4, suggests that limited explicit compliance engagement may reduce incentives to invest in operational testbed access.

These mechanisms should therefore be read as co-determining and plausible, not conclusively causal.

6.2. RQ2—Trustworthy AI Gap

The structural interpretation of the TRL ceiling is mirrored by the TAI coverage pattern, which is uneven across dimensions and sub-domains.

Privacy and data governance show a differentiated, but still incomplete, profile. In demand response, federated aggregation reduces direct data exposure; however, formal differential privacy guarantees appear in only 1 of the 6 FL papers, and no source reports a third-party privacy audit. Architectural intent and verifiable compliance therefore remain distinct in this evidence base. In load forecasting, coverage remains moderate (4/20), yet many NILM-oriented applications rely on fine-grained household metering traces, where data-governance implications can remain material even when privacy safeguards are not explicitly documented at abstract level. In HVAC control, privacy coverage is absent (0/24), broadly consistent with the use of aggregated zone signals rather than personal data.

Robustness is somewhat better represented overall, but it remains uneven across sub-domains. In practice, the mapped robustness signals are mainly technical reliability proxies under experimental conditions (for example benchmark generalisation and distribution-shift sensitivity), especially in load forecasting and HVAC abstracts. This evidence is informative for model behavior, but it does not yet amount to full operational assurance under sustained deployment conditions.

Transparency remains the most structurally absent dimension, with no sub-domain reaching MEDIUM-gap coverage and only 3 of 61 papers (4.9%) addressing it [36,43,66]. In BEMS applications, transparency includes both operator-facing interpretability and the auditability of automated decisions; however, where transparency appears in the mapped corpus, the evidence is often model-internal (for example attention-related cues) rather than operator-facing explanation, traceability support, or audit-ready reporting. This distinction is formalised in the XAI taxonomy of Arrieta et al. [84] between post-hoc model diagnostics and actionable decision explanations for end users. This gap is particularly notable in sub-domain B, where 22 of 24 papers report no operator-facing explanation and none documents a formal specification of the reward function’s objective weights. The distinction matters in BEMS settings, where human operators need interpretable rationale for control actions beyond internal model diagnostics.

This gap carries direct regulatory implications under the revised Energy Performance of Buildings Directive [2], which mandates that Building Automation and Control Systems (BACS) installed in non-residential buildings above a threshold capacity include technical documentation and audit trail capabilities consistent with operator-facing transparency, precisely the dimension most structurally absent from the mapped corpus.

Under the abstract-level charting protocol, the observed asymmetry should be interpreted as a reporting profile of the charted literature rather than as a definitive statement on full-system implementation depth. The current evidence therefore supports a clear gap diagnosis while leaving open how much additional TAI evidence may emerge under full-text assessment. The abstract-level charting protocol deliberately yields conservative lower-bound estimates of TAI coverage, reducing the risk of false positives in compliance gap identification (Section 2). A targeted micro-validation, detailed in Section 6.4, confirms that the gap diagnosis holds even after accounting for the estimated under-reporting bias: transparency remains the weakest dimension across sub-domains under both conservative and upper-bound estimates.

6.3. RQ3—Regulatory Readiness: EU AI Act Engagement Gap

The limited and asymmetric TAI coverage becomes more salient when considered in relation to the EU AI Act, especially for systems whose deployment context may fall near Annex III thresholds. Within the abstract-level charting protocol, no included source explicitly refers to the EU AI Act, its risk categories, or the ALTAI framework, including the six deployment-adjacent papers involving real buildings or live metering data [43,67,68,70,71,74]. This pattern holds across the 22 papers published after August 2024, when the Act entered into force.

Crossing the risk-category distribution against TRL band sharpens this picture: of the 23 borderline-candidacy papers, 19 fall within the Development band (TRL 4–6), 4 in the Research band (TRL 1–3), and none has crossed into the Demonstration band. Regulatory exposure is therefore not an artefact of early-stage research activity, but it tracks the development ceiling documented in Section 4, in line with the structural mechanisms identified under RQ1.

The evidence supports a regulatory engagement gap at publication level. The same inferential caveat identified under RQ1 applies here: the co-occurrence of regulatory exposure and deployment stalling is consistent with a causal link but does not establish it under the current abstract-level evidence base.

The compliance timeline of Regulation 2024/1689 [5] increases the practical urgency of this finding. Under Article 113 [5], prohibited AI practices became applicable in February 2025. For high-risk AI systems under Annex III, the category most relevant to the borderline-candidacy cohort identified here—obligations apply from August 2026 [5]. For the 19 borderline-candidacy papers currently at TRL 4–6, this is an immediate horizon, not a distant one. Systems moving toward pilot deployment in the next 12–18 months would need to meet conformity-assessment requirements before commissioning, including technical documentation (Article 11 [5]), logging and traceability (Article 12 [5]), and transparency measures (Article 13 [5]). The absence of any publication-level engagement with these obligations, including among the six papers involving real buildings or live metering data, suggests that this compliance horizon is not yet integrated into the research design cycle of BEMS-ML studies. This is a structural gap, distinct from but compounded by the TRL ceiling documented under RQ1. Regulatory readiness requires proactive design choices—such as logging architecture, reward-function documentation, and data-governance frameworks—that cannot be retrofitted at the deployment stage without significant re-engineering cost.

6.4. Limitations

These regulatory considerations must nonetheless be interpreted within the methodological boundaries of the review. Four methodological constraints shape the scope of inference of this review.

First, the analytical corpus comprises 61 papers drawn from Scopus and IEEE Xplore through a stratified quota design. Databases such as ACM Digital Library and Web of Science were not searched, and the findings should therefore be read as properties of this analytic sample rather than as prevalence estimates for the full screened population.

Second, TRL and TAI assignments were performed by a single rater using a deterministic rubric; inter-rater validation was not conducted, which leaves residual uncertainty, particularly at the TRL 5–6 boundary and in EU AI Act risk-tier classification. The rubric is nevertheless designed to minimise subjectivity by tying each assignment to observable binary abstract signals—for example, a named co-simulation environment indicates TRL 5, live sensor KPI values indicate TRL 6, and an operational pilot with a named building indicates TRL 7 (Table 1). This rule-based design reduces interpretive leeway in ambiguous cases, although it does not replace replicated inter-rater validation. Cohen’s kappa on a replicated subset therefore remains an appropriate extension for future work.

Third, as established by the conservative lower-bound protocol (Section 2.3), all charting relied on title and abstract only, so reported coverage figures remain conservative lower-bound estimates. Following PRISMA-ScR Item 12 [18], this interpretation reduces false positives in compliance gap identification but systematically understates the true TAI coverage of the mapped systems.

To estimate the conservative bias, a targeted micro-validation recoded

n = 12

papers (19.7% of the corpus) against the same TAI rubric applied to abstracts. In 9 of the 12 validated papers (75%), at least one TAI dimension absent from the abstract was present in the full text (robustness: 6/12; transparency: 4/12; privacy: 1/12). If the same under-reporting rates held across the corpus, aggregate coverage figures would rise to an estimated upper bound of 29.5% for robustness, 11.5% for transparency, and 23.0% for privacy, indicative of the magnitude of abstract-level underestimation rather than revised prevalence estimates. For TRL, two papers show deployment-context signals in the full text consistent with a TRL4 to TRL6 upgrade.

Fourth, the EU AI Act risk classification applied here is an analytical framework intended for descriptive research purposes; it does not constitute a legal compliance assessment and should be treated as indicative, subject to revision as legal interpretations and regulatory guidance evolve.

7. Conclusions

This scoping review mapped 61 peer-reviewed papers on ML for BEMS (2020–2026) against a three-axis analytical framework: Technology Readiness Level, ALTAI-derived Trustworthy AI dimensions, and EU AI Act risk proximity. The aggregate pattern is one of persistent imbalance. Strong methodological output has not translated into verified production deployment, documented trustworthy-AI provisions, or explicit regulatory engagement. The 90.2% of papers in the analytic corpus occupy the Development band (TRL 4–6) with no source crossing into multi-site production deployment; transparency (the ALTAI dimension most directly tied to operator accountability) is addressed in only 3 of 61 papers; and EU AI Act engagement is absent from all publication years, including the 22 papers that appeared after the Regulation entered into force in August 2024. Overall, these figures indicate a field-wide pattern rather than a sub-domain artefact: the double readiness gap characterises the publication-level evidence across algorithmic paradigms and evaluation traditions alike. From a sustainability standpoint, this imbalance matters because the energy and carbon savings promised by ML-based BEMS remain largely unverified under real operating conditions.

Taken together, these findings suggest that the current ML-BEMS literature is advancing along a technically capable but deployment-constrained trajectory. The core finding is therefore an accountability ceiling rather than a performance ceiling: the evidentiary record does not yet demonstrate that algorithmic sophistication has been matched by operational validation, governance documentation, or regulatory awareness.

The cross-domain comparison further shows that this gap is expressed differently across sub-domains. Load forecasting and energy monitoring appear methodologically mature but governance-light; HVAC control is the most operationally consequential area yet remains strongly constrained by simulation-to-real transfer; demand response shows the strongest privacy orientation, but still limited robustness and transparency reporting. These asymmetries suggest that future progress is unlikely to come from a single technical improvement alone, and will instead require sub-domain-specific advances in testbed access, deployment reporting, assurance practice, and compliance-aware system design.

A further contribution of this review is methodological. By combining TRL assignment, ALTAI-based coding, and EU AI Act proximity screening into a single Deployment Readiness Map, the paper provides a reusable framework for monitoring how the ML-BEMS field evolves beyond proof-of-feasibility claims toward more operationally and institutionally mature forms of evidence. The framework is intended as a comparative literature-mapping instrument rather than a substitute for full technical audit or legal assessment, but it can support future reviews and longitudinal updates of the field. In this sense, the Deployment Readiness Map is also a sustainability monitoring instrument: it tracks how AI technologies progress toward verifiable contributions to building energy efficiency.

These conclusions should nevertheless be interpreted within the limits of the study design. The findings refer to a stratified analytics sample of 61 papers drawn from 614 screened records, and the primary coding protocol relies on titles and abstracts only, with conservative lower-bound coding (Section 2.3) in ambiguous cases. Accordingly, the reported percentages should be read as properties of the mapped publication-level evidence rather than as prevalence estimates for the full screened literature or as definitive claims about the underlying deployed systems. This limitation is especially relevant for TAI coverage and regulatory-readiness interpretation, where some system-level provisions may not be visible in abstracts even when they exist in the full text.

On this basis, three priorities emerge for future work.

First, the field would benefit from more operationally explicit reporting, including clearer statements on deployment context, commissioning conditions, live performance horizons, and whether inference occurs in closed-loop building operation.

Second, TAI reporting should move beyond isolated references to privacy-preserving architectures or generic explainability claims, toward more verifiable documentation of robustness testing, operator-facing transparency, auditability, and data-governance mechanisms.

Third, as AI governance requirements begin to affect deployment environments more directly, future BEMS studies would benefit from reporting practices that make regulatory context legible without overstating legal status. Concretely, researchers should improve publication-level reporting on deployment context, TRL evidence, and application-relevant TAI dimensions; industry should focus on the TRL6-to-TRL7 transition through access to operational testbeds, live-condition KPI documentation, and early regulatory pre-screening; and regulators and policymakers should clarify when ML-enabled BEMS configurations move from research and development settings into use contexts where AI Act obligations become practically relevant, especially in borderline cases in HVAC control and demand response.

Overall, the review indicates that ML for BEMS is constrained less by modelling capability or methodological sophistication than by the weaker connection between technical development, operational validation, and governance-ready documentation. Whether the field closes this double readiness gap will determine whether it moves from laboratory-grade results toward deployment that is operationally validated and governance-documented in real building and energy-system contexts, and whether it can deliver its expected contribution to building decarbonisation and Sustainable Development Goal 7 (SDG 7).

Funding

A part of research was funded by ENEA within the framework of the Electric System Research Programme(Three-Year Implementation Plan 2025–2027), Project 1.5 “High efficiency buildings for the energy transition”, Work Package 4: “Promoting energy efficiency in buildings by increasing self-sufficiency, flexibility and awareness of energy consumption”. CUP: I53C24003330001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A

Table A1. Database Search Queries Used in Systematic Retrieval (per sub-domain).

Database	Sub-Domain	Query String (Abbreviated)	Years	Language
Scopus	Load Forecasting and Energy Monitoring	TITLE (“building load forecasting” OR “building energy forecasting” OR “residential load forecasting” OR “residential energy prediction” OR “household energy forecasting” OR “smart meter forecasting”…	2020–2026	English
Scopus	HVAC Control and Building Optimisation	TITLE (“HVAC” OR “EnergyPlus” OR “Sinergym” OR “building setpoint” OR “zone temperature” OR “building thermostat” OR “HVAC control” OR “building heating” OR “building cooling” OR “net-zero building”…	2020–2026	English
Scopus	Demand Response and Flexibility	TITLE (“demand response” OR “energy flexibility” OR “building flexibility” OR “EV charging” OR “vehicle-to-grid” OR “V2G” OR “peer-to-peer energy” OR “virtual power plant” OR “prosumer” OR “home ener”…	2020–2026	English
IEEE	Load Forecasting and Energy Monitoring	(“Document Title”:“building load forecasting” OR “Document Title”:“residential load forecasting” OR “Document Title”: “building energy forecasting” OR “Document Title”:“building energy prediction” OR “…”	2020–2026	English
IEEE	HVAC Control and Building Optimisation	((“Document Title”:“HVAC” OR “Document Title”:“building energy” OR “Document Title”: “EnergyPlus” OR “Document Title”: “Sinergym” OR ”Document Title”: “thermal comfort” OR “Document Title”: “building cont”…	2020–2026	English
IEEE	Demand Response and Flexibility	(“Document Title”:“demand response” OR “Document Title”:“building flexibility” OR “Document Title”: “home energy management” OR “Document Title”:“EV charging” OR “Document Title”:“vehicle-to-grid” OR “…”	2020–2026	English

References

European Commission, The European Green Deal. Communication from the Commission to the European Parliament, the European Council, the Council, the European Economic and Social Committee and the Committee of the Regions, COM(2019) 640 Final; European Commission: Brussels, Belgium, 2019; Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52019DC0640 (accessed on 28 May 2026).
European Parliament and Council. Directive 2024/1275/EU on the Energy Performance of Buildings (EPBD Recast). Off. J. Eur. Union 2024. 2024/1275. Available online: https://eur-lex.europa.eu/eli/dir/2024/1275/oj (accessed on 28 May 2026).
Alanne, K.; Sierla, S. An overview of machine learning applications for smart buildings. Sustain. Cities Soc. 2022, 76, 103445. [Google Scholar] [CrossRef]
Lavin, A.; Gilligan-Lee, C.M.; Visser, A.; Glocker, B.; Khan, S.; Grüll, P.; Marson, G.; Tilkin, S.; Durán, J.M. Technology readiness levels for machine learning systems. Nat. Commun. 2022, 13, 6039. [Google Scholar] [CrossRef] [PubMed]
European Parliament and Council. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Off. J. Eur. Union 2024. 2024/1689. Available online: https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng (accessed on 28 May 2026).
Venkatraj, V.; Dixit, M. Challenges in implementing data-driven approaches for building life cycle energy assessment: A review. Renew. Sustain. Energy Rev. 2022, 160, 112327. [Google Scholar] [CrossRef]
Al-Shargabi, A.A.; Almhafdy, A.; Ibrahim, D.M.; Alghieth, M.; Chiclana, F. Buildings’ energy consumption prediction models based on buildings’ characteristics: Research trends, taxonomy, and performance measures. J. Build. Eng. 2022, 54, 104577. [Google Scholar] [CrossRef]
Ji, J.; Yu, H.; Wang, X.; Xu, X. Machine learning application in building energy consumption prediction: A comprehensive review. J. Build. Eng. 2025, 104, 112295. [Google Scholar] [CrossRef]
Darvishvand, L.; Kamkari, B.; Huang, M.J.; Hewitt, N.J. A systematic review of explainable artificial intelligence in urban building energy modeling: Methods, applications, and future directions. Sustain. Cities Soc. 2025, 128, 106492. [Google Scholar] [CrossRef]
Yang, Y.; Duan, Q.; Samadi, F. A systematic review of building energy performance forecasting approaches. Renew. Sustain. Energy Rev. 2025, 223, 116061. [Google Scholar] [CrossRef]
Halhoul Merabet, G.; Essaaidi, M.; Ben Haddou, M.; Qolomany, B.; Qadir, J.; Anan, M.; Al-Fuqaha, A.; Abid, M.R.; Benhaddou, D. Intelligent building control systems for thermal comfort and energy-efficiency: A systematic review of artificial intelligence-assisted techniques. Renew. Sustain. Energy Rev. 2021, 144, 110969. [Google Scholar] [CrossRef]
Peng, Y.; Lei, Y.; Tekler, Z.D.; Antanuri, N. Hybrid system controls of natural ventilation and HVAC in mixed-mode buildings: A comprehensive review. Energy Build. 2022, 276, 112509. [Google Scholar] [CrossRef]
Xin, X.; Zhang, Z.; Zhou, Y.; Liu, Y.; Wang, D.; Nan, S. A comprehensive review of predictive control strategies in heating, ventilation, and air-conditioning (HVAC): Model-free VS model. J. Build. Eng. 2024, 94, 110013. [Google Scholar] [CrossRef]
Ala’raj, M.; Radi, M.; Abbod, M.F.; Majdalawieh, M.; Parodi, M. Data-driven based HVAC optimisation approaches: A Systematic Literature Review. J. Build. Eng. 2022, 46, 103678. [Google Scholar] [CrossRef]
Al Sayed, K.; Boodi, A.; Sadeghian Broujeny, R.; Beddiar, K. Reinforcement learning for HVAC control in intelligent buildings: A technical and conceptual review. J. Build. Eng. 2024, 95, 110085. [Google Scholar] [CrossRef]
Ramos, D.; Faria, P.; Vale, Z. Linking short-term electricity demand forecasting and explainable AI: A review for building energy applications. Appl. Energy 2026, 412, 127658. [Google Scholar] [CrossRef]
Maheepala, M.; Li, H.; Robert, D.; Meegahapola, L. Towards energy flexible commercial buildings: Machine learning approaches, implementation aspects, and future research directions. Energy Build. 2025, 346, 116170. [Google Scholar] [CrossRef]
Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.; Horsley, T.; Weeks, L.; et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef] [PubMed]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
thefuzz: Fuzzy String Matching in Python, Version 0.20+; SeatGeek, Inc.: New York, NY, USA, 2021. Available online: https://github.com/seatgeek/thefuzz (accessed on 31 March 2026).
European Commission. Technology Readiness Levels (TRL)—Horizon Europe Work Programme 2021–2022; Technical Report; Publications Office of the European Union: Luxembourg, 2020. [Google Scholar]
European Commission, High-Level Expert Group on Artificial Intelligence. Assessment List for Trustworthy Artificial Intelligence (ALTAI) for Self-Assessment; Technical Report; Publications Office of the EU: Luxembourg, 2020. [Google Scholar]
Kong, X.; Gui, Z.; Wu, M.; Miao, C.; Luo, Z. A Hybrid Deep Learning Framework for Non-Intrusive Load Monitoring. Electronics 2026, 15, 453. [Google Scholar] [CrossRef]
Wu, L.; Fu, S.; Han, Y.; Luo, Y.; Xu, M.; Liu, L. pNILM: Whole-process privacy preservation for non-intrusive load monitoring based on deep neural networks. Expert Syst. Appl. 2025, 272, 126689. [Google Scholar] [CrossRef]
Inala, K.; Rampelli, R. Load forecasting for home energy management using light gradient boosting machine learning algorithm. Int. J. Ambient. Energy 2025, 46, 2577864. [Google Scholar] [CrossRef]
Agarwal, V.; Ardakanian, O.; Pal, S. Robust peer-to-peer federated learning for non-intrusive load monitoring in smart homes. Energy Build. 2025, 329, 115209. [Google Scholar] [CrossRef]
Agarwal, V.; Ardakanian, O.; Pal, S. A Robust and Privacy-Aware Federated Learning Framework for Non-Intrusive Load Monitoring. IEEE Trans. Sustain. Comput. 2024, 9, 766–777. [Google Scholar] [CrossRef]
Virtsionis-Gkalinikis, N.; Nalmpantis, C.; Vrakas, D. Torch-NILM: An Effective Deep Learning Toolkit for Non-Intrusive Load Monitoring in Pytorch. Energies 2022, 15, 2647. [Google Scholar] [CrossRef]
Shabbir, N.; Vassiljeva, K.; Hokmabad, H.; Husev, O.; Petlenkov, E.; Belikov, J. Comparative Analysis of Machine Learning Techniques for Non-Intrusive Load Monitoring. Electronics 2024, 13, 1420. [Google Scholar] [CrossRef]
Akbar, M.; Amayri, M.; Bouguila, N. A novel non-intrusive load monitoring technique using semi-supervised deep learning framework for smart grid. Build. Simul. 2024, 17, 441–457. [Google Scholar] [CrossRef]
Angelis, G.; Timplalexis, C.; Salamanis, A.; Krinidis, S.; Ioannidis, D.; Kehagïas, D.; Tzovaras, D. Energformer: A New Transformer Model for Energy Disaggregation. IEEE Trans. Consum. Electron. 2023, 69, 308–320. [Google Scholar] [CrossRef]
Ouzine, J.; Marzouq, M.; Bennani, S.; Lahrech, K.; El Fadili, H. New hybrid deep learning models for multi-target NILM disaggregation. Energy Effic. 2023, 16, 82. [Google Scholar] [CrossRef]
Cheng, Z.; Yao, Z. A novel approach to predict buildings load based on deep learning and non-intrusive load monitoring technique, toward smart building. Energy 2024, 312, 133456. [Google Scholar] [CrossRef]
Jiang, L.; Liu, M.; Jin, J.; Zheng, Y.; He, X. ConvTransNILM: A parallel transformer model with convolution for energy disaggregation. Energy Build. 2025, 347, 116361. [Google Scholar] [CrossRef]
Lei, L.; Shao, S.; Liang, L. An evolutionary deep learning model based on EWKM, random forest algorithm, SSA and BiLSTM for building energy consumption prediction. Energy 2024, 288, 129795. [Google Scholar] [CrossRef]
Varanasi, L.; Karri, S. STNILM: Switch Transformer based Non-Intrusive Load Monitoring for short and long duration appliances. Sustain. Energy Grids Netw. 2024, 37, 101246. [Google Scholar] [CrossRef]
Tokam, L.W.; Apeke, S.K.; Ouro-Djobo, S.S. Hybrid HDBSCAN-FHMM Approach for Energy Disaggregation in Non-Intrusive Load Monitoring (NILM) Systems. IEEE Access 2025, 13, 89685–89703. [Google Scholar] [CrossRef]
Ayub, M.; El-Alfy, E.S. Contextual Sequence-to-Point Deep Learning for Household Energy Disaggregation. IEEE Access 2023, 11, 75599–75616. [Google Scholar] [CrossRef]
Werthen-Brabants, L.; Dhaene, T.; Deschrijver, D. Uncertainty quantification for appliance recognition in non-intrusive load monitoring using Bayesian deep learning. Energy Build. 2022, 270, 112282. [Google Scholar] [CrossRef]
Zhou, X.; Feng, J.; Wang, J.; Pan, J. Privacy-preserving household load forecasting based on non-intrusive load monitoring: A federated deep learning approach. PeerJ Comput. Sci. 2022, 8, e1049. [Google Scholar] [CrossRef] [PubMed]
Andrean, V.; Lian, K.L.; Iqbal, I.M. A Parallel Bidirectional Long Short-Term Memory Model for Energy Disaggregation. IEEE Can. J. Electr. Comput. Eng. 2022, 45, 150–158. [Google Scholar] [CrossRef]
Kaselimi, M.; Doulamis, N.; Voulodimos, A.; Protopapadakis, E.; Doulamis, A. Context Aware Energy Disaggregation Using Adaptive Bidirectional LSTM Models. IEEE Trans. Smart Grid 2020, 11, 3054–3067. [Google Scholar] [CrossRef]
Chen, B.; Cai, Z.; Bergés, M. Gnu-RL: A Practical and Scalable Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy. Front. Built Environ. 2020, 6, 562239. [Google Scholar] [CrossRef]
Ding, X.; Cerpa, A.; Du, W. Multi-Zone HVAC Control with Model-Based Deep Reinforcement Learning. IEEE Trans. Autom. Sci. Eng. 2025, 22, 4408–4426. [Google Scholar] [CrossRef]
Guan, S.; Chen, R.; Liu, J.; Zhou, K.; Dai, X.; Li, W.; Somasundaram, S.; Chong, A.; Lee, C.; Yuen, C. Adaptive multi-agent HVAC control for thermal comfort using multi-agent PPO with population-based training. Energy Build. 2026, 353, 116882. [Google Scholar] [CrossRef]
Chen, H.; Sun, D.; Sun, Y.; Zhang, Y.; Yang, H. Multi-Agent Deep Reinforcement Learning-Based HVAC and Electrochromic Window Control Framework. Buildings 2025, 15, 3114. [Google Scholar] [CrossRef]
Wang, X.; Mahdavi, N.; Sethuvenkatraman, S.; West, S. An environment-adaptive SAC-based HVAC control of single-zone residential and office buildings. Data-Centric Eng. 2025, 6, e3. [Google Scholar] [CrossRef]
Shin, M.; Kim, S.; Kim, Y.; Song, A.; Kim, Y.; Kim, H. Development of an HVAC system control method using weather forecasting data with deep reinforcement learning algorithms. Build. Environ. 2024, 248, 111069. [Google Scholar] [CrossRef]
Adibhesami, M.; Hassanzadeh, A. Optimizing HVAC energy efficiency in low-energy buildings: A comparative analysis of reinforcement learning control strategies under Tehran climate conditions. Data-Centric Eng. 2025, 6, e40. [Google Scholar] [CrossRef]
Kumkam, S.; Trinuruk, P.; Chaiwiwatworakul, P.; Saito, K. A comparative analysis of reinforcement learning and model predictive control for HVAC system optimization. J. Build. Eng. 2025, 112, 113776. [Google Scholar] [CrossRef]
Fang, X.; Gong, G.; Li, G.; Chun, L.; Peng, P.; Li, W.; Shi, X.; Chen, X. Deep reinforcement learning optimal control strategy for temperature setpoint real-time reset in multi-zone building HVAC system. Appl. Therm. Eng. 2022, 212, 118552. [Google Scholar] [CrossRef]
Chen, L.; Meng, F.; Zhang, Y. MBRL-MC: An HVAC Control Approach via Combining Model-Based Deep Reinforcement Learning and Model Predictive Control. IEEE Internet Things J. 2022, 9, 19160–19173. [Google Scholar] [CrossRef]
Wu, Z.; Mu, Y.; Jin, X.; Xu, Y.; Jia, H.; Zhao, J. AE-TD3 with adaptive expert guidance: Towards responsive deep reinforcement learning for building HVAC control systems. Energy Build. 2026, 351, 116744. [Google Scholar] [CrossRef]
Zhang, L.; Guo, J.; Chen, C.; Lin, P.; Tiong, R. Robust deep reinforcement learning for improved energy-comfort performance in HVAC systems. Build. Environ. 2026, 287, 113895. [Google Scholar] [CrossRef]
Deng, X.; Zhang, Y.; Qi, H. Towards optimal HVAC control in non-stationary building environments combining active change detection and deep reinforcement learning. Build. Environ. 2022, 211, 108680. [Google Scholar] [CrossRef]
Ding, Z.K.; Fu, Q.; Chen, J.; Wu, H.; Lu, Y.; Hu, F.Y. Energy-efficient control of thermal comfort in multi-zone residential HVAC via reinforcement learning. Connect. Sci. 2022, 34, 2364–2394. [Google Scholar] [CrossRef]
Esrafilian-Najafabadi, M.; Haghighat, F. Towards self-learning control of HVAC systems with the consideration of dynamic occupancy patterns: Application of model-free deep reinforcement learning. Build. Environ. 2022, 226, 109747. [Google Scholar] [CrossRef]
Lim, S.H.; Kim, T.G.; Yeom, D.; Yoon, S.G. Robust deep reinforcement learning for personalized HVAC system. Energy Build. 2024, 319, 114551. [Google Scholar] [CrossRef]
Xue, W.; Jia, N.; Zhao, M. Multi-agent deep reinforcement learning based HVAC control for multi-zone buildings considering zone-energy-allocation optimization. Energy Build. 2025, 329, 115241. [Google Scholar] [CrossRef]
Fu, C.; Zhang, Y. Research and Application of Predictive Control Method Based on Deep Reinforcement Learning for HVAC Systems. IEEE Access 2021, 9, 130845–130852. [Google Scholar] [CrossRef]
Dinh, H.T.; Kim, D. MILP-Based Imitation Learning for HVAC Control. IEEE Internet Things J. 2022, 9, 6107–6120. [Google Scholar] [CrossRef]
Deng, X.; Zhang, Y.; Qi, H. Toward Smart Multizone HVAC Control by Combining Context-Aware System and Deep Reinforcement Learning. IEEE Internet Things J. 2022, 9, 21010–21024. [Google Scholar] [CrossRef]
Fu, Q.; Chen, X.; Ma, S.; Fang, N.; Xing, B.; Chen, J. Optimal control method of HVAC based on multi-agent deep reinforcement learning. Energy Build. 2022, 270, 112284. [Google Scholar] [CrossRef]
Khabbazi, A.; Pergantis, E.; Reyes Premer, L.; Papageorgiou, P.; Lee, A.; Braun, J.; Henze, G.; Kircher, K. Lessons learned from field demonstrations of model predictive control and reinforcement learning for residential and commercial HVAC: A review. Appl. Energy 2025, 399, 126459. [Google Scholar] [CrossRef]
Hedayat, S.; Ziarati, T.; Manganelli, M. A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation. Energies 2025, 18, 6310. [Google Scholar] [CrossRef]
Yu, L.; Sun, Y.; Xu, Z.; Shen, C.; Yue, D.; Jiang, T.; Guan, X. Multi-Agent Deep Reinforcement Learning for HVAC Control in Commercial Buildings. IEEE Trans. Smart Grid 2021, 12, 407–419. [Google Scholar] [CrossRef]
Han, Y.; Gao, W.; Wang, Z.; Zhao, Q. Optimizing grid-interactive buildings demand response: Sequence-based decision-making multi-agent policy decomposition deep reinforcement learning. Energy Build. 2025, 347, 116198. [Google Scholar] [CrossRef]
Wang, R.; Qiu, H.; Gao, H.; Li, C.; Dong, Z.Y.; Liu, J. Adaptive Horizontal Federated Learning-Based Demand Response Baseline Load Estimation. IEEE Trans. Smart Grid 2024, 15, 1659–1669. [Google Scholar] [CrossRef]
Kampezidou, S.I.; Romberg, J.; Vamvoudakis, K.G.; Mavris, D.N. Decentralized and Privacy-Preserving Learning of Approximate Stackelberg Solutions in Energy Trading Games with Demand Response Aggregators. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 6304–6319. [Google Scholar] [CrossRef]
Li, Y.; Chen, X.; Wang, Y. EdgeHEM: Sparse Federated Reinforcement Learning for Home Energy Management at the Edge. IEEE Trans. Smart Grid 2025, 16, 5602–5614. [Google Scholar] [CrossRef]
Yu, H.; Zhang, J.; Ma, J.; Chen, C.; Geng, G.; Jiang, Q. Privacy-preserving demand response of aggregated residential load. Appl. Energy 2023, 339, 121018. [Google Scholar] [CrossRef]
Chen, S.J.; Chiu, W.Y.; Liu, W.J. User Preference-Based Demand Response for Smart Home Energy Management Using Multiobjective Reinforcement Learning. IEEE Access 2021, 9, 161627–161637. [Google Scholar] [CrossRef]
Amer, A.; Bashir Shaban, K.; Massoud, A. DRL-HEMS: Deep Reinforcement Learning Agent for Demand Response in Home Energy Management Systems Considering Customers and Operators Perspectives. IEEE Trans. Smart Grid 2023, 14, 239–250. [Google Scholar] [CrossRef]
Qiu, D.; Xue, J.; Zhang, T.; Wang, J.; Sun, M. Federated reinforcement learning for smart building joint peer-to-peer energy and carbon allowance trading. Appl. Energy 2023, 333, 120526. [Google Scholar] [CrossRef]
Islam, S.; Badsha, S.; Sengupta, S.; Khalil, I.; Atiquzzaman, M. An Intelligent Privacy Preservation Scheme for EV Charging Infrastructure. IEEE Trans. Ind. Inform. 2023, 19, 1238–1247. [Google Scholar] [CrossRef]
Chen, Y.; Chen, C.; Zhang, X.; Cui, M.; Li, F.; Wang, X.; Yin, S. Privacy-Preserving Baseline Load Reconstruction for Residential Demand Response Considering Distributed Energy Resources. IEEE Trans. Ind. Inform. 2022, 18, 3541–3550. [Google Scholar] [CrossRef]
Fraija, A.; Agbossou, K.; Henao, N.; Kelouwani, S.; Fournier, M.; Hosseini, S.S. A Discount-Based Time-of-Use Electricity Pricing Strategy for Demand Response with Minimum Information Using Reinforcement Learning. IEEE Access 2022, 10, 54018–54028. [Google Scholar] [CrossRef]
Charbonnier, F.; Peng, B.; Vienne, J.; Stai, E.; Morstyn, T.; McCulloch, M. Centralised rehearsal of decentralised cooperation: Multi-agent reinforcement learning for the scalable coordination of residential energy flexibility. Appl. Energy 2025, 377, 124406. [Google Scholar] [CrossRef]
Zhang, Y.; Lin, R.; Mei, Z.; Lyu, M.; Jiang, H.; Xue, Y.; Zhang, J.; Gao, D. Interior-point policy optimization based multi-agent deep reinforcement learning method for secure home energy management under various uncertainties. Appl. Energy 2024, 376, 124155. [Google Scholar] [CrossRef]
Bahrami, S.; Chen, Y.C.; Wong, V.W.S. Deep Reinforcement Learning for Demand Response in Distribution Networks. IEEE Trans. Smart Grid 2021, 12, 1496–1506. [Google Scholar] [CrossRef]
Xu, X.; Jia, Y.; Xu, Y.; Xu, Z.; Chai, S.; Lai, C. A Multi-Agent Reinforcement Learning-Based Data-Driven Method for Home Energy Management. IEEE Trans. Smart Grid 2020, 11, 3201–3211. [Google Scholar] [CrossRef]
Singh, A.; Panigrahi, B.K. Prosumer Cost Efficiency and Ensuring Grid Stability Through a Hierarchical PPO Framework in Decentralized Community Energy Management. IEEE Access 2025. [Google Scholar] [CrossRef]
Sajid, S.; Li, B.; Berehman, B.; Guo, Q.; Kang, Y.; Athar, M.; Muqtadir, A. Decentralized Multi-Agent Reinforcement Learning Control of Residential Battery Storage for Demand Response. Energies 2025, 18, 5712. [Google Scholar] [CrossRef]
Arrieta, A.B.; Diaz-Rodriguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]

Figure 1. PRISMA 2020-informed flow diagram for the scoping review.

Figure 2. ML technique category by BEMS sub-domain (

N = 61

, 2020–2026).

Figure 2. ML technique category by BEMS sub-domain (

N = 61

, 2020–2026).

Figure 3. TRL distribution across the publication window (2020–2026), aggregated into three bands: Research (TRL 1–3), Development (TRL 4–6), and Demonstration (TRL 7–9) (

N = 61

).

Figure 3. TRL distribution across the publication window (2020–2026), aggregated into three bands: Research (TRL 1–3), Development (TRL 4–6), and Demonstration (TRL 7–9) (

N = 61

).

Figure 4. Median TRL by ML technique and training category (

N = 61

, 2020–2026).

Figure 4. Median TRL by ML technique and training category (

N = 61

, 2020–2026).

Figure 5. TAI coverage gap matrix (ALTAI-based) by sub-domain and dimension (

N = 61

, 2020–2026).

Figure 5. TAI coverage gap matrix (ALTAI-based) by sub-domain and dimension (

N = 61

, 2020–2026).

Figure 6. EU AI Act risk tier distribution by BEMS sub-domain (

N = 61

, 2020–2026).

Figure 6. EU AI Act risk tier distribution by BEMS sub-domain (

N = 61

, 2020–2026).

Figure 7. Deployment Readiness Map (DRM) for the final corpus (

N = 61

, 2020–2026).

Figure 7. Deployment Readiness Map (DRM) for the final corpus (

N = 61

, 2020–2026).

Table 1. TRL Assignment Rubric applied in this Review (EU/Horizon Europe guidelines [21]).

TRL	EU/Horizon Europe Definition	BEMS/ML Contextualisation	Abstract-Level Classifier Signals
TRL 1–2	Basic principles observed; technology concept formulated	Theoretical concept or algorithmic formulation; no implemented system	No dataset, no evaluation results; conceptual formulation without validation evidence; no performance metric cited.
TRL 3	Experimental proof of concept	Core algorithm demonstrated under controlled conditions	Synthetic or author-generated data used for validation; proof-of-concept results reported; no domain-representative benchmark dataset or building physics simulator referenced.
TRL 4	Technology validated in lab	Validated on a public benchmark dataset (offline evaluation, standardised train/test splits; no building-physics model)	Named public benchmark dataset; standard train/test splits reported; no building-physics simulator mentioned; lower-bound rule applied if signal remains ambiguous.
TRL 5	Technology validated in relevant environment	Validated within a domain-representative building physics simulation	Explicit mention of building physics simulation environments (i.e., EnergyPlus), or equivalent co-simulation framework as the evaluation environment.
TRL 6	Technology demonstrated in relevant environment	Prototype evaluated on real building data (offline/retrospective)	Real-building, smart-meter, or live-sensor dataset named or described with historical monitoring data; or high-fidelity co-simulation beyond standard public benchmarks; no pilot or operational deployment language; no live KPI values.
TRL 7	System prototype demonstration in operational environment	Pilot in an operational building with a live control loop and documented KPIs	Terms such as pilot, field test, operational deployment, real-world implementation; quantitative KPI values (e.g., energy savings %, MAPE) reported under real conditions.
TRL 8	System complete and qualified	Sustained production deployment within a single organisation	Continuous or ongoing operation stated; live inference confirmed; performance validated under operational (non-replay) conditions; single-organisation or single-site scope.
TRL 9	Actual system proven in operational environment	Multi-building or multi-site sustained deployment	≥2 distinct sites or buildings; monitoring duration ≥12 months or explicitly sustained; diverse conditions (building types, climates, grid contexts) described.

Table 2. Trustworthy AI Assessment Rubric applied in this Review (ALTAI framework [22]).

ALTAI Dimension	ALTAI Definition	BEMS/ML Contextualisation	Abstract-Level Classifier
Privacy & Data Governance	Data minimisation, purpose limitation, lawful processing basis, data subject rights, and consent mechanisms	Federated or on-device training; differential privacy; GDPR-compatible data pipeline; explicit data minimisation strategy reported	Terms federated learning, differential privacy, privacy-preserving, GDPR, data governance, data sharing, anonymization, split learning.
Robustness	Reliability under varying operating conditions; resilience to errors, faults, and adversarial inputs; failsafe and fallback plan; cybersecurity robustness	Performance evaluated under unseen building types, cross-dataset transfer, or out-of-distribution conditions; uncertainty quantification reported	Terms robustness, adversarial, drift, distribution shift, out-of-distribution, uncertainty quantification, conformal, fault tolerance, resilience.
Transparency	Traceability of data, processes, and decisions; explainability of system outputs to users; communication of AI system identity to affected parties	Post-hoc or ante-hoc explainability method applied; operator dashboard or decision audit trail described	Terms explainability, XAI, SHAP, LIME, interpretability, feature importance, attention mechanism, transparent model, human-in-the-loop.

Table 3. ML Applicationsin Load Forecasting & Energy Monitoring: Techniques, Deployment Maturity, and TAI Coverage (2020–2026).

Ref.	Yr	ML Tech.	TRL	TAI Coverage	KPI (Value)
[23]	2026	Transformer	6	—	f1 (24.5%)
[24]	2025	FL	6	Privacy & Data Gov.	NR
[25]	2025	PPO	6	—	mape (0.18%)
[26]	2025	FL	6	Privacy & Data Gov.; Robustness	accuracy
[27]	2024	FL	6	Privacy & Data Gov.; Robustness	accuracy
[28]	2022	ML	6	—	NR
[29]	2024	LSTM	6	—	accuracy
[30]	2024	LSTM	6	—	Δf1 (+15%)
[31]	2023	Transformer	6	—	accuracy
[32]	2023	LSTM	6	—	f1 (78.90%)
[33]	2024	LSTM	4	Robustness	accuracy (97.2%)
[34]	2025	Transformer	4	—	indicator NR (50%)
[35]	2024	LSTM	4	Robustness	Δmape (−24.55%)
[36]	2024	Transformer	4	Transparency	accuracy
[37]	2025	ML	4	Robustness	mae (6.25%)
[38]	2023	LSTM	4	Robustness	NR
[39]	2022	ML	4	Robustness	NR
[40]	2022	FL	4	Privacy & Data Gov.	NR
[41]	2022	LSTM	4	—	NR
[42]	2020	LSTM	4	—	NR

NR = Not Reported; indicator NR = indicator label not reported in the abstract, with numerical value extracted from the abstract-level performance statement; ML = Machine Learning; FL = Federated Learning; LSTM = Long Short-Term Memory; PPO = Proximal Policy Optimisation; MAPE = Mean Absolute Percentage Error; MAE = Mean Absolute Error; Δ = improvement vs. baseline. TAI Coverage: Privacy & Data Gov. = Privacy & Data Governance (ALTAI); Robustness = ALTAI Robustness dimension; Transparency = ALTAI Transparency dimension; — = no ALTAI dimension explicitly addressed at abstract level.

Table 4. ML Applications in HVAC Control & Building Optimization: Techniques, Deployment Maturity, and TAI Coverage (2020–2026).

Ref.	Yr	ML Tech.	TRL	TAI Coverage	KPI (Value)
[43]	2020	MPC	7	Transparency	energy savings (6.6%)
[44]	2025	ML	5	—	energy savings (8.23%)
[45]	2026	PPO	5	—	indicator NR (28.2%)
[46]	2025	PPO	5	—	energy savings (19.8%)
[47]	2025	SAC	5	—	NR
[48]	2024	DQN	5	—	energy savings (58.79%)
[49]	2025	DQN	5	—	accuracy (10%)
[50]	2025	DQN	5	—	indicator NR (3.29%)
[51]	2022	DQN	5	—	NR
[52]	2022	MBRL	5	—	NR
[53]	2026	SAC	5	—	convergence (37.50%)
[54]	2026	PPO	4	Robustness	indicator NR (35.90%)
[55]	2022	DQN	4	Robustness	indicator NR (13%)
[56]	2022	PPO	4	—	indicator NR (20.5%)
[57]	2022	DQN	4	—	indicator NR (7.87%)
[58]	2024	SAC	4	Robustness	indicator NR (24%)
[59]	2025	DQN	4	—	indicator NR (6.7%)
[60]	2021	TD3	4	Robustness	indicator NR (16%)
[61]	2022	IL	4	—	NR
[62]	2022	SAC	4	—	indicator NR (15.9%)
[63]	2022	DQN	4	—	accuracy (11.1%)
[64]	2025	PPO	2	—	indicator NR (71%)
[65]	2025	PI-RL	2	—	indicator NR (34.7%)
[66]	2021	Transformer	2	Robustness; Transparency	indicator NR (40%)

NR = Not Reported; indicator NR = indicator label not reported in the abstract, with numerical value extracted from the abstract-level performance statement; ML = Machine Learning; MPC = Model Predictive Control (non-ML comparator); MBRL = Model-Based Reinforcement Learning; IL = Imitation Learning; DQN = Deep Q-Network; PPO = Proximal Policy Optimisation; SAC = Soft Actor-Critic; TD3 = Twin Delayed Deep Deterministic Policy Gradient; PI-RL = Physics-Informed Reinforcement Learning. TAI Coverage: Robustness = ALTAI Robustness dimension; Transparency = ALTAI Transparency dimension; — = no ALTAI dimension explicitly addressed at abstract level.

Table 5. ML Applications in Demand Response & Flexibility: Techniques, Deployment Maturity, and TAI Coverage (2020–2026).

Ref.	Yr	ML Tech.	TRL	TAI Coverage	KPI (Value)
[67]	2025	MPC	6	—	indicator NR (12%)
[68]	2024	FL	6	Privacy & Data Gov.	NR
[69]	2024	PPO	6	Privacy & Data Gov.	NR
[70]	2025	FL	6	Privacy & Data Gov.	NR
[71]	2023	ML	6	Privacy & Data Gov.	NR
[72]	2021	ML	6	—	cost reduction (8.44%)
[73]	2023	Transformer	6	—	NR
[74]	2023	FL	6	Privacy & Data Gov.	indicator NR (5.87%)
[75]	2023	FL	4	Privacy & Data Gov.	accuracy (95%)
[76]	2022	FL	4	Privacy & Data Gov.	indicator NR (62.5%)
[77]	2022	SAC	4	—	NR
[78]	2025	ML	4	Privacy & Data Gov.	indicator NR (47.2%)
[79]	2024	Transformer	4	—	NR
[80]	2021	FL	4	Privacy & Data Gov.	indicator NR (33%)
[81]	2020	DQN	4	—	NR
[82]	2025	PPO	2	—	cost reduction
[83]	2025	SAC	2	—	cost reduction (50%)

NR = Not Reported; indicator NR = indicator label not reported in the abstract, with numerical value extracted from the abstract-level performance statement; ML = Machine Learning; FL = Federated Learning; MPC = Model Predictive Control (non-ML); PPO = Proximal Policy Optimisation; SAC = Soft Actor-Critic; DQN = Deep Q-Network. TAI Coverage: Privacy & Data Gov. = Privacy & Data Governance (ALTAI); — = no ALTAI dimension explicitly addressed at abstract level.

Table 6. TRL Distribution by BEMS Sub-domain: Stratified Paper Count (

N = 61

, 2020–2026).

Table 6. TRL Distribution by BEMS Sub-domain: Stratified Paper Count (

N = 61

, 2020–2026).

Sub-Domain	Research (TRL 1–3)	Development (TRL 4–6)	Demonstration (TRL 7–9)	Total
Load Forecasting and Energy Monitoring	0	20	0	20
HVAC Control and Building Optimisation	3	20	1	24
Demand Response and Flexibility	2	15	0	17
Total	5	55	1	61

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Malvoni, M. The Double Readiness Gap in Machine Learning for Building Energy Management: A Scoping Review of Deployment Maturity, Trustworthy AI, and EU AI Act Alignment. Sustainability 2026, 18, 6107. https://doi.org/10.3390/su18126107

AMA Style

Malvoni M. The Double Readiness Gap in Machine Learning for Building Energy Management: A Scoping Review of Deployment Maturity, Trustworthy AI, and EU AI Act Alignment. Sustainability. 2026; 18(12):6107. https://doi.org/10.3390/su18126107

Chicago/Turabian Style

Malvoni, Maria. 2026. "The Double Readiness Gap in Machine Learning for Building Energy Management: A Scoping Review of Deployment Maturity, Trustworthy AI, and EU AI Act Alignment" Sustainability 18, no. 12: 6107. https://doi.org/10.3390/su18126107

APA Style

Malvoni, M. (2026). The Double Readiness Gap in Machine Learning for Building Energy Management: A Scoping Review of Deployment Maturity, Trustworthy AI, and EU AI Act Alignment. Sustainability, 18(12), 6107. https://doi.org/10.3390/su18126107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

The Double Readiness Gap in Machine Learning for Building Energy Management: A Scoping Review of Deployment Maturity, Trustworthy AI, and EU AI Act Alignment

Abstract

1. Introduction

2. Materials and Methods

2.1. Review Protocol and Eligibility

2.2. Information Sources and Study Selection

2.3. Data Charting Framework

2.4. Technology Readiness Level Rubric

2.5. Trustworthy AI (TAI) Assessment

2.6. EU AI Act Risk Classification

3. Sub-Domain Findings

3.1. Sub-Domain A: Load Forecasting and Energy Monitoring

3.2. Sub-Domain B: HVAC Control and Building Optimisation

3.3. Sub-Domain C: Demand Response and Flexibility

4. Cross-Domain Analysis

4.1. ML Taxonomy

4.2. TRL Landscape Across Sub-Domains

4.3. TAI Coverage Summary

4.4. EU AI Act Risk Distribution

5. Deployment Readiness Map

6. Discussion

6.1. RQ1—Deployment Maturity: The TRL Ceiling Problem

6.2. RQ2—Trustworthy AI Gap

6.3. RQ3—Regulatory Readiness: EU AI Act Engagement Gap

6.4. Limitations

7. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI