Applications and Trends of Machine Learning in Building Energy Optimization: A Bibliometric Analysis

Liu, Jingyi; Chen, Jianfei

doi:10.3390/buildings15070994

Open AccessReview

Applications and Trends of Machine Learning in Building Energy Optimization: A Bibliometric Analysis

by

Jingyi Liu

^1,2 and

Jianfei Chen

^1,2,*

¹

School of Architecture and Design, Harbin Institute of Technology, Harbin 150001, China

²

Key Laboratory of Cold Region Urban and Rural Human Settlement Environment Science and Technology, Ministry of Industry and Information Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(7), 994; https://doi.org/10.3390/buildings15070994

Submission received: 20 January 2025 / Revised: 22 February 2025 / Accepted: 18 March 2025 / Published: 21 March 2025

(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

Download

Browse Figures

Versions Notes

Abstract

With the rapid advancement of machine learning (ML) technologies, their innovative applications in enhancing building energy efficiency are increasingly prominent. Utilizing tools such as VOSviewer and Bibliometrix, this study systematically reviews the body of the related literature, focusing on the key applications and emerging trends of cutting-edge ML techniques, including deep learning, reinforcement learning, and unsupervised learning, in optimizing building energy performance and managing carbon emissions. First, this paper delves into the role of ML in building performance prediction, intelligent energy management, and sustainable design, with particular emphasis on how smart building systems leverage real-time data analysis and prediction to optimize energy usage and significantly reduce carbon emissions dynamically. Second, this study summarizes the technological evolution and future trends of ML in the building sector and identifies critical challenges faced by the field. The findings provide a technology-driven perspective for advancing sustainability in the construction industry and offer valuable insights for future research directions.

Keywords:

machine learning (ML); energy optimization in buildings; building energy efficiency; deep learning; bibliometric analysis

1. Introduction

1.1. Background of Machine Learning Applications in Building Energy Efficiency

With the rapid advancement of machine learning (ML) technology, its potential applications in the building industry—particularly in energy efficiency and carbon reduction—have garnered significant attention. The modern building sector is undergoing profound transformations, leveraging intelligent technologies and data-driven solutions to enhance operational efficiency and sustainability [1,2]. As the building sector accounts for over 40% of global carbon emissions, the development of energy-saving systems in digitalized buildings has become a pivotal focus for sustainable development, attracting widespread global interest [3]. ML technology, with its ability to analyze and predict energy consumption across a building’s lifecycle, offers robust support for optimizing energy management and formulating carbon reduction strategies [4,5,6]. Its integration not only enhances operational efficiency but also creates new opportunities for transitioning to greener practices.

In intelligent building environments, ML facilitates real-time data analysis and dynamic decision making through techniques such as deep learning, reinforcement learning, and unsupervised learning. Deep learning refers to neural networks with many layers that can model complex patterns in large datasets, enabling systems to make accurate predictions and decisions without explicit programming [7]. Reinforcement learning, on the other hand, involves training models to make decisions through trial and error, where the system learns to optimize actions based on rewards or penalties, making it particularly effective for dynamic environments like energy management in buildings [8,9,10,11]. Unsupervised learning enables systems to identify hidden patterns in data without predefined labels, allowing for a more flexible analysis of large datasets to detect anomalies or group similar data points, which is useful for optimizing building operations [12,13]. By leveraging these methods, building systems can automatically adjust energy distribution based on real-time demand, thereby improving energy efficiency and reducing carbon emissions [14,15]. Furthermore, the digitalization of intelligent buildings is driving the architecture industry’s shift from traditional design and management models to smarter, data-driven, and automated paradigms, establishing a solid foundation for achieving sustainability goals [16,17].

1.2. Research Status and Problems

In recent years, significant progress has been made in academia and industry in optimizing building energy efficiency and reducing carbon emissions, particularly through the application of machine learning (ML) technologies. These applications include building performance prediction [18,19,20], energy management optimization [15,21,22], and sustainable design [23,24]. However, current research still faces several challenges and limitations:

Existing studies lack a comprehensive review of the specific applications of ML in building energy efficiency optimization and carbon reduction [25,26].
While technologies such as deep learning and reinforcement learning show great potential in intelligent building systems, emerging approaches like unsupervised learning and multimodal data fusion have received less attention [27,28].
Collaborative networks across regions and institutions have not yet achieved large-scale effectiveness, and research resources and outcomes remain unevenly distributed [22].
Practical evaluations of the benefits of technology implementation are insufficient, limiting large-scale applications [13,17].

Against this backdrop, the application of ML in the architecture industry must further explore its potential in energy efficiency optimization and carbon emission management. The related literature reviews primarily summarize the current research advancements in four key areas (Table 1). Existing reviews predominantly focus on traditional machine learning techniques, such as regression analysis and support vector machines, while the applications of newer technologies, including deep learning, reinforcement learning, and unsupervised learning, are often presented in a fragmented manner. Moreover, most reviews concentrate on isolated aspects of energy efficiency or carbon emission management, overlooking the synergistic potential of smart building systems and real-time data analytics in optimizing building energy performance.

Specifically, ML techniques such as deep learning [4,15], reinforcement learning [8,9], and unsupervised learning [12] demonstrate immense potential in architecture. Deep learning models effectively address complex nonlinear relationships, enabling accurate building performance predictions and forming a basis for energy management and optimization. Reinforcement learning dynamically adjusts building operational parameters through interactive learning with the environment, achieving optimal energy control. Unsupervised learning uncovers latent patterns and insights from unlabeled data, offering novel perspectives for building design and operation. For instance, by analyzing real-time energy consumption data, ML models can forecast future energy demand and dynamically adjust the parameters of systems such as HVAC and lighting. This facilitates precise energy management and optimization, ultimately reducing energy consumption and carbon emissions [7,20].

1.3. Research Objectives and Significance

To address the aforementioned challenges, this study performs a systematic review and employs visualization tools to examine research on the application of machine learning technologies in optimizing building energy efficiency and reducing carbon emissions. Scientometric analysis is conducted using tools such as VOSviewer_1.6.20 and Bibliometrix (Biblioshiny version 4.0). The primary objectives of this paper are as follows:

To explore the pivotal role of machine learning in building performance prediction, energy management optimization, and sustainable design.
To examine the applications and emerging trends of technologies, including deep learning, reinforcement learning, and unsupervised learning, within intelligent building systems.
To identify current research hotspots and technical challenges, providing actionable recommendations for future research directions and practical applications.

Through bibliometric analysis, this study aims to construct a comprehensive knowledge framework for researchers in related fields. It provides a systematic and comprehensive summary of the applications of advanced technologies such as deep learning, reinforcement learning, and unsupervised learning, and, through a review of their implementation in building energy optimization, highlights their potential and emerging trends in this domain, addressing gaps in the existing literature. From a systematic perspective, this study explores how the integration of smart building technologies with machine learning models can dynamically optimize energy usage and reduce carbon emissions, ultimately achieving synergistic benefits. It seeks to serve as both a reference and a guide for future research efforts, promoting the continued development and expanded implementation of machine learning technologies in energy-efficient building design and carbon reduction strategies.

1.4. Structure of This Paper

This paper is organized as follows: Section 2 describes the data collection and analysis methods and addresses the research questions. Section 3 presents descriptive statistics and visualization results regarding the literature on machine learning applications in building energy efficiency optimization. Section 4 provides a comprehensive discussion of the findings, contextualized within existing research, and proposes recommendations for improvement. Lastly, Section 5 summarizes the key conclusions and highlights potential future research directions.

2. Literature Collection and Analysis Methods

In recent years, researchers have increasingly focused on the potential of machine learning (ML) for advancing building energy efficiency and reducing carbon emissions. To gain deeper insights into the progression of research in this domain and uncover future opportunities, it is essential to examine the historical development of high-quality scientific publications using scientometric methods.

This approach facilitates the identification of disciplinary research patterns and highlights promising directions for further investigation. The SALSA framework (Search, Appraisal, Synthesis, and Analysis) [34,35], a structured methodology for conducting literature reviews and evaluating research, was employed in this study due to its systematic and comprehensive nature. SALSA is specifically designed to offer a structured process that ensures a thorough and objective evaluation of the existing literature, which is essential for synthesizing a wide array of studies and identifying key trends in a given field. By leveraging the SALSA framework, this work systematically organized and examined existing studies, offering a comprehensive perspective on the field’s developments and emerging trends [36]. Its inclusion enables a clear, step-by-step methodology that enhances the reliability and transparency of the review process.

This study follows a systematic process, as illustrated in Figure 1. The steps include defining research questions, selecting appropriate databases, determining search terms, choosing analytical tools, extracting relevant literature data, analyzing research findings, and, finally, providing practical recommendations.

2.1. Research Questions and Scope

A well-defined set of research questions is essential for structuring a bibliometric review that is both comprehensive and focused. This study, therefore, centers on the following critical questions, aiming to provide a systematic overview of machine learning applications in building energy conservation and carbon reduction:

What is the current state of research regarding the application of machine learning in building energy efficiency and carbon reduction?
How can machine learning technologies be effectively integrated into building systems, and how can specific technical frameworks optimize building energy efficiency and carbon emission management?
What are the challenges in integrating machine learning methods into building energy systems?
What are the emerging trends and potential future research directions in this field?
What are the key factors for optimizing building energy efficiency through machine learning models?
What specific obstacles exist in the widespread deployment of multi-objective optimization algorithms in real-world building energy systems? How can these obstacles be overcome?
How can demand-side management in buildings reduce energy consumption through the use of machine learning?

2.2. Data Collection and Selection Criteria

The Web of Science (WoS) includes high-quality academic journals across diverse fields, including architecture, engineering, and social sciences. Its robust deficit indicator system guarantees the reliability and comprehensiveness of the peer-reviewed articles. Additionally, WoS enforces stringent quality control over its journals, and its citation tracking and citation-based metrics provide essential support in assessing research impact. For this study, the Web of Science (WoS) database served as the source for obtaining the literature sample [37,38]. To prevent any bias stemming from updates to the database, the retrieval process was performed once on 18 December 2024. First, the “subject” field of the WoS Core Collection was queried using a search string that included the primary terms related to “machine learning” and “building energy conservation”, along with their associated keywords.

The relevance of the keywords was verified by examining the scope and topics addressed in the most highly cited journals within each discipline. For artificial intelligence and machine learning, the highly cited journals Machine Intelligence and IEEE Transactions on Pattern Analysis and Foundations and Trends in Machine Learning were analyzed to define relevant terms. Similarly, for building energy conservation, keywords were derived by reviewing journals such as Building and Environment, Energy and Buildings, and Journal of Building Performance. This process ensured that the selected keywords not only reflect current research trends but also capture the core themes and methodologies within these fields. Additionally, by prioritizing highly cited journals, we ensured that the keywords represent the most influential and widely recognized concepts in the literature. Integrating these analyses with the objectives of this review, we developed a precise and comprehensive search strategy, thus ensuring the rigor of the literature search and its alignment with this review’s focus. The finalized search string was as follows: TS = (“Machine learning” OR “Deep learning” OR “Reinforcement learning” OR “Unsupervised learning”) AND (“Energy optimization*” OR “Carbon reduction*” OR “Building performance prediction” OR “Energy management optimization” OR “Building energy efficiency” OR “Energy conservation” OR “Building carbon emissions” OR “Energy saving technologies” OR “Carbon footprint reduction” OR “Energy-efficient buildings” OR “Zero-energy buildings”).

The initial query returned a total of 538 records. This research included only primary articles, review studies, conference papers, and book chapters in the analysis. For quality assessment, each article’s title and abstract were independently reviewed in detail by the authors, and the records were saved in plain text format based on the following criteria:

This study focuses on buildings and their related systems.
This research explores the application of machine learning technologies—particularly deep learning, reinforcement learning, and unsupervised learning—in building energy efficiency optimization, energy management, or carbon emission control.
The information is published in peer-reviewed scientific articles or conference papers.

Articles that did not meet these criteria were excluded. Papers that broadly mentioned machine learning in the context of building energy conservation, energy management, or carbon reduction without further analyzing specific applications or methodologies were also excluded. After this screening process, 496 valid articles were retained for further analysis.

2.3. Bibliometric Analysis Tools and Methods

This study employed two software tools for bibliometric analysis. The first tool, Biblioshiny, is a web-based application created by Massimo Aria and Corrado Cuccurullo from the University of Naples and Luigi Vanvitelli at the University of Campania. It enables users to analyze bibliometric data and visualize results through various formats, including graphs and maps [39]. The second tool, VOSviewer_1.6.20, was developed by the Centre for Science and Technology Studies (CWTS) at Leiden University in the Netherlands. VOSviewer is designed to generate bibliometric visualizations based on co-citation networks, which encompass entities like researchers, journals, and institutions. Keywords, titles, and abstracts act as network nodes, connected by various relationships, including co-citation, co-authorship, and co-occurrence. Co-citation analysis examines the frequency with which two items are cited together, co-authorship analysis measures similarity by identifying shared publications, and co-occurrence analysis identifies connections based on simultaneous appearances in research data [40].

3. Results

This section summarizes the findings derived from visualizing data from a sample of 496 publications, selected based on the search strategy described in Section 2.2. Key outcomes include bibliometric maps illustrating co-authorship networks among authors and institutions, co-citation links between authors, journals, and references, and keyword co-occurrence trends. To analyze the temporal evolution of this research field, descriptive statistics are presented using both graphical and tabular formats.

3.1. Demographic Overview of the Study Area

3.1.1. Overview of the Sampled Publications

The results of the bibliometric analysis are presented for a dataset comprising 496 publications, which includes 421 journal articles (84.88%), 39 conference proceedings (7.86%), and 35 review papers (7.06%). Table 2 summarizes the fundamental characteristics of this dataset.

The application of machine learning in the field of building energy efficiency has emerged as a research trend in recent years. Figure 2 depicts the progression of studies in this domain over time. The volume of publications experienced remarkable growth from 2020 to 2024, with an average annual increase of 98.85%. A notable surge in scientific output occurred between 2021 and 2023, highlighting the rapid development and growing interest in this emerging area of research.

As shown in Figure 2, the pattern of average yearly citations differs from the trajectory of annual research output. Starting from 2020, the average citations per year show a notable decline. While there is a slight recovery or stabilization in 2022, the overall trend continues to decrease in subsequent years. This downward trajectory could potentially be attributed to a reduction in the visibility or impact of recent publications. Additionally, this decrease may also be influenced by factors such as the quality of the studies or a decrease in international collaborations, as previous studies have highlighted a correlation between the quantity and type of collaboration and citation impact [41,42].

3.1.2. Authors with the Highest Productivity

Figure 3 illustrates the correlation between the number of authors and the quantity of published documents, analyzed through the lens of Lotka’s Law. This principle in bibliometrics outlines how productivity is distributed among authors within a particular research field [43]. It states that the number of authors producing n papers is inversely proportional to n². In other words, only a small percentage of authors are highly productive, contributing multiple publications, while the majority of authors contribute fewer publications. This results in a skewed distribution, where a significant portion of documents is authored by a small group of prolific contributors [44].

Figure 3 depicts the relationship between the percentage of authors and their corresponding number of published documents, as predicted by Lotka’s law. The steep decline in the curve indicates that a large proportion of authors have published only one or two documents, while only a small fraction of authors have contributed significantly more. Specifically, among the 1781 authors analyzed, only 0.7% (12 authors) have published five or more documents, highlighting the rarity of highly productive contributors in this field. In contrast, authors who have published only one document account for 86.1% (1534 authors) of the total, demonstrating their significant collective contribution to the research in this area. The alignment between the theoretical curve (dotted line) and the observed data (solid line) further illustrates that the author’s productivity in this domain closely follows Lotka’s law, with the majority of authors contributing fewer than three documents. This reflects the skewed productivity distribution that is typical of many academic fields.

The authors’ impact was further evaluated through the calculation of metrics such as the H-index, G-index, M-index, and total citations. The H-index, proposed by physicist Jorge Hirsch, measures the balance between a scholar’s academic productivity and impact. The G-index, proposed by Leo Egghe, measures the concentration of citations, emphasizing the contribution of multiple highly cited papers. The M-index typically refers to the calculation of a scholar’s average annual impact based on their H-index, reflecting the scholar’s academic influence each year and indicating long-term academic performance.

Table 3 and Figure 4 presents the results, highlighting the 15 most influential authors. Dr. Huijun Wu from Guangzhou University stands out as the leading author, with the highest number of publications as well as the top H-index, G-index, and M-index. While his articles do not have the highest total citation count, this can be attributed to his entry into the field in 2022. Older publications generally accumulate more citations over time, and many of his recent works have not yet been widely accessed by readers.

3.1.3. The Most Influential Sources

Bradford’s Law of Scattering describes the distribution pattern of the academic literature across journals within a specific discipline, revealing the phenomenon of concentration and dispersion of publications in a given field. According to this law, a relatively small number of core journals publish the majority of the relevant literature, while a larger number of peripheral journals contribute significantly less. As highlighted in reference [45], this distribution pattern underscores the importance of identifying “core journals” that are highly focused on a particular topic. Bradford’s Law is particularly valuable for targeting journals in narrower research areas, as it assists researchers in identifying key publications within their field. Furthermore, the law categorizes journals into three zones: the first zone, considered the core, consists of a small number of journals that are highly dedicated to the specific topic [46]; the second zone includes journals with moderate citation counts; and the third zone comprises a larger number of less prominent journals in the field. The number of journals in the second and third zones is expected to be n and n² times greater, respectively, than those in the first zone [47]. This distribution illustrates that the academic literature is not uniformly spread across all journals but is instead concentrated in a few core journals while dispersed across a wide range of peripheral journals. Specifically, a small number of core journals publish a majority of the significant literature in the field, whereas a larger number of peripheral journals contribute only a minor portion of relevant publications. See Equation (1).

T 1 : T 2 : T 3 = 1 : n : n 2

(1)

According to Bradford’s Law, journals in the core zone typically exert significant influence on academic research within a specific field. An analysis of the data presented in Figure 5 and Table 4 reveals that Energy and Buildings and Journal of Building Engineering occupy the core zone, demonstrating outstanding performance in terms of H-index, G-index, and M-index, with values of 25, 41, and 5, and 15, 29, and 3, respectively. This indicates that these two journals not only excel in the number of highly cited articles but also maintain substantial attention from researchers and sustained impact within the field. Additionally, their total citation counts, at 2267 and 1049, respectively, along with publication outputs of 149 and 82 articles, underscore their central role in advancing research progress.

Moreover, journals in the second zone (e.g., Applied Energy, Building and Environment, and Sustainable Cities and Society) exhibit slightly lower HGM metrics compared to the core zone but still maintain relatively high H-index and G-index values. For instance, Applied Energy achieves values of 10 and 17, while Sustainable Cities and Society reports 9 and 17. These journals not only broaden the scope of research within the field but also play a crucial role as part of the secondary core in advancing specific areas of study. Their focus on specialized or emerging topics makes them important in addressing niche research areas, contributing to the development of new theories and practices. By publishing research that may not yet be widespread but is highly relevant, they help drive innovation and foster the growth of new subfields.

In contrast, journals in the third zone (e.g., Frontiers in Built Environment and Journal of Building Performance) exhibit overall lower HGM metrics, with M-index values mostly below 1. This can likely be attributed to the fact that these journals began publishing articles relatively recently (e.g., from 2022 onwards) or have a lower total publication volume. However, these journals serve a significant purpose in emerging fields or interdisciplinary research. Despite their lower metrics, they often publish pioneering research on cutting-edge topics that have yet to gain widespread recognition. As such, they provide valuable platforms for early-stage exploration and offer researchers an opportunity to engage with new, underexplored areas of study that could shape future trends.

In summary, the core zone journals distinguish themselves with exceptional performance in both citation metrics and sustained academic impact, as reflected in their high HGM indices. Meanwhile, journals in the second and third zones contribute to the depth and breadth of research from different perspectives, aligning with Bradford’s Law’s prediction regarding the distribution of the literature across zones.

3.1.4. Leading Publications in the Field

Table 5 provides an overview of the ten most-cited publications worldwide. These papers represent current research directions in building energy management and optimization, particularly emphasizing the use of machine learning and deep learning approaches within architecture. The majority of these studies were published between 2020 and 2023, with each paper typically involving four to five authors, underscoring the collaborative and interdisciplinary nature of research in this area. Furthermore, institutional collaborations are highly prevalent, with 8 out of the 10 papers resulting from such partnerships, underscoring the shared interest and cooperative efforts of academia and industry in addressing building energy efficiency challenges.

Regarding research methodologies, four papers adopt experimental approaches, while the other six are review articles. This suggests a gradual shift in this field from exploratory stages toward the integration and refinement of established technologies. For instance, Zhang et al. (2022) [20] and Fu et al. (2022) [9] provide comprehensive reviews on the applications of machine learning and reinforcement learning in building energy efficiency control, identifying research trends in areas such as air quality, thermal comfort, and energy optimization. These reviews establish a theoretical foundation for subsequent studies and highlight critical directions for future research.

Citation analysis reveals that the influence of these publications is steadily increasing. For example, the works of Gopinath et al. (2020) [51] and Seyedzadeh et al. (2020) [54] have garnered significant attention, with their proposed methodologies recognized for both their practical utility and forward-thinking approach. The former’s research on non-intrusive load monitoring techniques is widely cited, emphasizing its importance in intelligent energy management, while the latter’s machine learning model for predicting the energy performance of non-domestic buildings provides valuable support for deep energy retrofit decision making.

From a technological application perspective, these studies span various methodological approaches. For example, Brandi et al. (2020) [49] optimized indoor temperature control and energy consumption using deep reinforcement learning, demonstrating the potential of AI technologies in complex building environments. Similarly, Dong et al. (2021) [52] employed ensemble learning and energy consumption pattern classification to predict hourly energy consumption in office buildings, offering viable solutions for efficient short-term energy management. Additionally, Mounir et al. (2023) [55] introduced an innovative approach to short-term electric load forecasting for smart grid energy management systems by integrating empirical mode decomposition (EMD) and bidirectional long short-term memory (BI-LSTM) techniques.

Overall, these studies have not only advanced the field of building energy efficiency optimization through theoretical innovation but also demonstrated significant potential in practical applications. The mutual citation relationships among the literature and the continuity of research methodologies further indicate the formation of a closely knit academic community in this field. Through the collaboration of theory and technology, this community is driving the field toward deeper development, continuously promoting interdisciplinary integration and innovation in building and energy management technologies.

3.1.5. Three-Field Plot Overview

The three-field plot illustrates the most prolific countries/regions (AU_CO), major academic journal sources (SO), and key research themes (DE) in the field of building energy efficiency while also revealing the relationships among them [56]. The first column in Figure 6 represents countries or regions, with China, the United States, and Australia identified as the primary contributors to scientific research in this domain. The middle column highlights key journal sources such as Energy and Buildings, Journal of Building Engineering, and Buildings, which serve as core platforms for publishing studies related to building energy efficiency and machine learning. The right column presents key research themes, including “Machine Learning”, “Deep Learning”, “Energy Efficiency”, and “HVAC”, reflecting the distribution of research hotspots.

From Figure 6, it is evident that China is the most active country, closely associated with technologies such as “Machine Learning”, “Deep Learning”, and “Reinforcement Learning”, with most of its research findings published in Energy and Buildings and Journal of Building Engineering. The United States and Australia, on the other hand, have focused their research not only on topics such as “Thermal Comfort” and “Artificial Neural Network” but have also demonstrated significant potential in the field of intelligent building energy management.

Earlier research themes primarily centered on foundational topics such as “Energy Consumption” and “Energy Efficiency”. However, emerging themes in recent years, such as “Reinforcement Learning” and “HVAC”, highlight a growing emphasis on the application of intelligent and deep learning technologies in the built environment. This trend indicates a gradual shift in the field towards a more intelligent, data-driven approach to building energy efficiency.

3.2. Geographical Perspective of the Study Area

3.2.1. Scientific Output and Collaboration Across Countries

As illustrated in Figure 7, the collaboration map between countries clearly reveals the major international partnerships in the field of building energy efficiency. Among these, the frequency of collaboration is highest between China and the United States, China and Australia, as well as the United Kingdom and Australia. This highlights the central position of research teams from China, the United States, the United Kingdom, and Australia within high-frequency collaboration networks. The academic productivity based on author affiliations further underscores the depth and intensity of international cooperation. In the map, the thickness of the connecting lines represents the strength of collaboration, with thicker lines indicating closer partnerships.

Developing countries, with China at the forefront, play a prominent role in advancing research on building energy efficiency and smart buildings. While high-income countries have already achieved notable technological progress, emerging technologies exert a greater influence on urban functionality, productivity, and livability in developing regions. Additionally, the digital innovations of smart cities are seen as optimal solutions to alleviate the challenges posed by population growth in these nations, effectively addressing the rising demand for infrastructure and services [57].

Figure 8 further illustrates the global collaboration patterns in the research fields of building energy efficiency and smart technologies. China, the United States, the United Kingdom, and Australia dominate the collaboration network, as evidenced by larger nodes and denser connections, reflecting their strong academic output and cooperative capabilities. From 2022 to 2025, research hotspots have gradually shifted toward Asia and emerging countries, such as Saudi Arabia and Singapore, indicating the rapid rise in these regions in terms of research influence. The collaboration patterns exhibit a dual characteristic of regionalization and globalization, particularly in partnerships between East Asia and Western countries, as well as within the Commonwealth nations and European regions. Overall, the field is forming a highly collaborative international research system, reflecting the globalized nature of research on building energy efficiency technologies.

3.2.2. Countries’ Key Research Affiliations

To provide an overview of potential collaborative institutions for researchers in the fields of architecture and intelligent buildings, we conducted a systematic analysis of the publication outputs and collaboration networks of major organizations. Figure 9 illustrates the collaboration networks and academic trends of multiple research institutions between 2021 and 2023. The research activity and academic prominence of these institutions are visually represented by node size and color.

Tsinghua University (Tsinghua Univ) stands out as a central node in the collaboration network, leveraging its extensive academic connections and robust research capabilities. It maintains close partnerships with prominent domestic and international institutions, including the National University of Singapore (Natl Univ Singapore), Shenzhen University (Shenzhen Univ), and Chongqing University (Chongqing Univ). This underscores Tsinghua University’s significant international influence in the fields of architecture, intelligent buildings, and smart cities. Similarly, institutions such as Tongji University (Tongji Univ), Xi’an University of Architecture and Technology (Xi’an Univ Architecture and Techn), and Huazhong University of Science and Technology (Huazhong Univ Sci and Technol) also demonstrate remarkable research activity and extensive academic collaborations. The gradient of node colors further reflects the temporal distribution of research activity, with the National University of Singapore and Tsinghua University maintaining consistent activity throughout the study period.

International collaborations are also a defining feature of research in the field of intelligent buildings. Institutions such as the University of Sydney (Univ Sydney), the University of Nottingham (Univ Nottingham), and the University of Illinois (Univ Illinois) have established close interactions with Chinese research institutions, contributing to the global expansion and deepening of research in this area. An analysis of the data in Table 6 reveals that Suzhou University of Science and Technology (Suzhou Univ of Science and Technology) leads in publication output with 27 articles, followed by Chongqing University (21 articles) and Tongji University (19 articles). The growth trends in Figure 10 further validate these findings, as most institutions have shown linear or accelerated growth in publication output since 2021, reflecting a sustained increase in research interest in this field. Notably, Suzhou University of Science and Technology exhibits an outstanding growth trajectory, rapidly ascending since 2021 to reach a cumulative output of 27 articles by 2024, positioning itself as a leading institution in this domain. Chongqing University and Tongji University also demonstrate steady growth, particularly during 2022–2023, reflecting an intensified focus on research during this period. Although Tsinghua University ranks slightly lower in total output (19 articles), its consistent growth trend, combined with its extensive collaboration network, highlights its sustained research strength and influence.

Some international institutions, such as the National University of Singapore and the U.S. Department of Energy (DOE), display certain fluctuations in their publication trajectories. These variations may be attributed to the cyclical nature of international collaborations or shifts in research resource allocation. Nevertheless, these institutions maintain a high overall level of academic contribution, underscoring their pivotal roles in advancing global research in intelligent buildings.

Importantly, 2021 emerges as a critical turning point, with the publication outputs of many institutions significantly increasing from this year onward. This trend may correlate with a growing global demand for intelligent buildings, green architecture, and smart city technologies, coupled with enhanced policy support and increased research funding [58,59,60].

3.3. Intellectual Perspective of the Study Area

3.3.1. Author Co-Citation Analysis

The author co-citation network (ACCN) provides a comprehensive visualization of key contributors and their intellectual influence within the field of energy-efficient and intelligent building research (Figure 11). The network reveals several prominent authors whose work has significantly shaped this domain, along with the interconnections among influential publications [61].

At the core of the network, Amasyali K. (2018) [62] emerges as a pivotal author, frequently cited for their foundational work on sustainable energy and building performance [62]. The central position of this node indicates its extensive influence on subsequent studies, particularly in integrating data-driven approaches to optimize building energy systems. Similarly, Pérez-Lombard L. (2008) [63] is another critical figure whose seminal review of building energy consumption has become a cornerstone for research in the field. This work likely provided a systematic framework and robust theoretical foundation, making it indispensable for studies addressing energy efficiency in the built environment [63]. In addition to these core authors, the network highlights methodological breakthroughs introduced by scholars such as Sutton R.S. (2018) [64] and Mnih V. (2015) [65]. Sutton’s contributions to reinforcement learning have catalyzed the adoption of machine learning techniques for smart building automation, enabling adaptive energy management and control systems [64]. Mnih’s work on deep neural networks has similarly driven advancements in predictive modeling and optimization within the domain of intelligent building systems [65]. The presence of Breiman L. (2001) [66] as a frequently co-cited author further underscores the importance of machine learning methodologies, particularly ensemble techniques such as random forests, which have become integral to energy modeling and decision-making processes in architectural research [66].

The co-citation analysis also underscores the interdisciplinary nature of this research area. The modular structure of the network reveals distinct thematic clusters, with one focusing on energy-efficient building design (e.g., Amasyali and Pérez-Lombard), another on machine learning applications (e.g., Breiman and Mnih), and a third on methodological frameworks for sustainable architecture. These clusters represent the intersection of architectural science, computational techniques, and sustainability principles, emphasizing the increasing focus on data-driven strategies to tackle complex issues in the built environment.

From a bibliometric perspective, the analysis of influential publications such as Amasyali K. (2018) [62] and Pérez-Lombard L. (2008) [63] reveals their dual role as both theoretical cornerstones and practical references for the field. The frequent co-citation of these works demonstrates their utility in shaping research trajectories, bridging gaps between conceptual understanding and practical implementation. Similarly, the methodological contributions of Sutton and Mnih represent a paradigm shift, introducing advanced machine learning frameworks that are now critical to intelligent building design. The analysis suggests that future research in this field will likely build on these established works, focusing on the fusion of machine learning techniques with sustainable architectural practices to develop smarter, more energy-efficient buildings. This trend highlights the need for ongoing interdisciplinary collaboration to tackle the combined challenges of sustainability and technological innovation in the built environment.

3.3.2. Journal Co-Citation Analysis

The Journal Co-Citation Network (JCCN) analysis reveals the intellectual underpinnings and key journals within a research domain by illustrating the co-citation relationships between various journals [67]. As depicted in Figure 12, Applied Energy and Energy and Buildings occupy prominent central positions, underscoring their pivotal academic influence in the fields of building energy efficiency and intelligent buildings. Specifically, Applied Energy, covering multidisciplinary aspects of energy utilization and management, provides a broad theoretical and practical foundation for sustainable building design. Energy and Buildings, focusing on building energy efficiency and sustainable performance optimization, stands as an indispensable resource for the building sector. Moreover, Building and Environment is another highly cited core journal, with its research encompassing a comprehensive analysis of the impact of the built environment on energy efficiency, indoor comfort, and environmental consequences, highlighting the significance of the interaction between buildings and their surroundings. The network also reveals significant influences from journals in other disciplines. For example, Renewable and Sustainable Energy Reviews emphasizes renewable energy technologies and their application in buildings, providing essential support for interdisciplinary research in energy and construction. The Journal of Cleaner Production expands the field’s perspective by integrating aspects of sustainable production and environmental management into building design.

The analysis of the network’s cluster structure indicates the presence of several thematic modules. A module centered on Energy and Applied Thermal Engineering focuses on building energy system optimization and thermal management. Another module, centered on Building and Environment and Sustainable Cities and Society, concentrates on sustainable urban building design and societal impact. A third module, centered on the Journal of Building Engineering and Renewable Energy, is more inclined towards the integrated application of building intelligence and new energy technologies. These modular structures reflect the diversity of research hotspots and the connections between various themes within the field. This analysis provides insights for future research, revealing the increasing importance of interdisciplinary collaboration in research on building energy efficiency and intelligent buildings.

3.3.3. Document Co-Citation Analysis

Figure 13 illustrates a document co-citation network (DCCN), visualizing the interconnected relationships among cited publications [68]. The network’s structure, characterized by dense connections between nodes, indicates a high degree of citation overlap and thematic coherence within the research domain, highlighting the interconnectedness of studies. Furthermore, recent publications exhibit increased interconnectivity within the network, reflecting a growing convergence of research interest towards the application of machine learning and advanced building energy management technologies.

The node colors represent a temporal dimension, providing a nuanced perspective on the research field’s evolution. Light-colored nodes, located primarily at the network’s periphery, represent foundational studies from the earlier stages of research in the field. These publications laid the groundwork for subsequent studies, exploring core concepts and conducting preliminary investigations, thus establishing the field’s initial parameters. Medium-colored nodes, mostly clustered in the central part of the graph, signify that these authors form the bedrock of the research field and that these articles connect foundational studies with cutting-edge approaches. Dark-colored nodes, such as “Olu-Ajayi (2022) [48]”, “Mounir (2023) [55]”, and “Hosamo (2022) [53]”, highlight the current trajectory of research. These recent studies not only demonstrate a surge of interest in deep learning, load forecasting, and intelligent energy management but also indicate that the research focus is shifting from fundamental explorations to the practical application of new technologies. Their central positions and dense connections indicate their substantial influence and integration into the contemporary research landscape. An analysis of the author representation within the network underscores the significance of “Olu-Ajayi (2022) [48]”, “Hong (2020) [5]”, “Gopinath (2020) [51]”, and “Dac-Khuong Bui (2020) [69]” in the research domain, further confirming the field’s high degree of consistency. The network’s high density, indicated by numerous connections, suggests a well-established and highly integrated research field. This interconnectedness reflects a high degree of scholarly communication and the building upon previous research.

The network structure clearly reveals several research groups actively contributing to the field. For instance, one group focuses on “Olu-Ajayi (2022) [48]”, “Gao (2024) [58]”, and “Seyedzadeh (2020) [54]”, with researchers primarily focused on applying new technologies. Another group leans more towards theoretical aspects, with research concentrated on “Gopinath (2020) [51]”, “Wenninger (2022)”. A significant portion of recent publications concentrates on incorporating machine learning (ML) methods, especially deep learning (DL), into building energy management systems. This trend reflects a shift towards more sophisticated data-driven methodologies aimed at optimizing building performance. The network also highlights the significance of advanced building control systems that integrate intelligent load forecasting and optimized strategies for managing HVAC (heating, ventilation, and air conditioning) systems.

Overall, the network centers on publications that focus on achieving sustainable building practices and energy efficiency goals, which is the core driver of research in this field. The network’s development signals an increasing need for interdisciplinary collaborations, particularly between computer science and building science. Future research should concentrate on validating and scaling the application of these advanced technologies in real-world building environments, addressing practical challenges and translating theory into practice. There is also a clear opportunity for establishing benchmark studies and standardized methodologies to objectively assess the performance of various technologies within the building sector. The document co-citation network analysis not only illustrates the present intellectual landscape of the research domain but also highlights the shift in research focus towards the integration of advanced technologies for improving energy efficiency in building management systems. This comprehensive analysis provides a foundation for future research, underscoring areas that merit further investigation and collaboration.

3.3.4. Co-Occurring Keyword Network

A co-occurrence analysis was conducted to delve into the core themes, research hotspots, and knowledge structure of the research domain by constructing a keyword co-occurrence network graph (as shown in Figure 14) in conjunction with a keyword data table (as shown in Table 7). The co-occurrence analysis, a bibliometric technique, examines the frequency with which keywords appear together in the literature, thereby revealing the conceptual relationships within the field. In the co-occurrence network graph, node size corresponds to the frequency of keyword occurrence, line thickness reflects the strength of co-occurrence between keywords, and node color may denote the time of keyword emergence. As observed in the graph, keywords such as “performance”, “model”, “machine learning”, “buildings”, and “optimization” exhibit larger nodes and thicker connecting lines, indicating that these are pivotal concepts within the field that frequently co-occur in the same literature, thereby reflecting the significance of modeling, performance evaluation, and optimization methods in this research area. Concurrently, “deep learning” and “reinforcement learning”, characterized by nodes with a more yellow hue, suggest they are relatively recent themes, representing emerging trends within the field. Moreover, the strong association between “buildings”, “thermal comfort”, and “optimization” highlights the central position of building environment and energy conservation research. The keyword data table further provides quantitative support, such as the highest occurrence frequencies of “performance” and “model” (91 and 88, respectively), affirming their core status. Furthermore, the multidimensional scaling coordinates (Dim1 and Dim2) reveal the evolution of research methodologies within this area, progressing from applied to more theoretical approaches. In summary, the research domain is centered around “performance” and “model”, exhibiting a trend towards emerging topics like “deep learning” and “reinforcement learning”, while consistently maintaining a focus on building environment and energy conservation research. This analysis not only elucidates the core knowledge system of the research area but also provides robust support for identifying future research directions.

3.4. Thematic Evolution Perspective of the Study Area

3.4.1. Thematic Map

A thematic map was generated using Biblioshiny to visualize the authors’ keywords. This map serves to distinguish the relevance and development of the topics discussed, highlighting both the most prominent issues within a given time period and the marginal topics that have nonetheless contributed to shaping the overall discourse [70]. The map is two-dimensional, with density as one axis and centrality as the other [71]. Density represents “the extent of theme development, as indicated by the internal associations between keywords” [71], while centrality measures “the importance of themes, based on external associations among keywords” [71]. The map is divided into four quadrants:

Basic themes are located in the lower-right quadrant, which contains underdeveloped but general topics [72];
Motor themes are situated in the upper-right quadrant, encompassing highly developed and central themes crucial to the field [72];
Niche themes occupy the upper-left quadrant, representing specialized yet peripheral topics with strong internal connections, even if their overall importance is not as high [72];
Emerging or declining themes are located in the lower-left quadrant, representing those with low density and centrality, which could potentially develop into more prominent topics in the future [72].

The Figure 15 presents the thematic map of the analyzed database based on authors’ keywords, with the size of each circle reflecting the number of words within that cluster [70].

Based on the analysis of both Table 8 and Figure 15, the thematic quadrant analysis reveals that ‘performance’, situated within the basic themes quadrant, exhibits notably high centrality (5.59, ranked 14th) and frequency (1165), despite a relatively lower density ranking (5th). This indicates that ‘performance’ serves as a pivotal core theme within the field and has already reached a considerable level of maturity. In the motor theme quadrant, ‘demand’ demonstrates higher centrality (1.492, ranked 13th) and a substantial frequency (305) but a lower density ranking (11th). This suggests a certain relevance, although its developmental potential is not yet fully realized. When viewed in conjunction with the ‘demand response’ theme, it is evident that the field is focusing on research related to demand-side response.

Within the niche theme quadrant, ‘storage’, ‘life-cycle assessment’, ‘challenges’, and ‘networks’ share the common characteristic of lower centrality scores (0, 0.075, 0.05, and 0.088, respectively, ranked 1.5th, 5th, 4th, and 7th) and higher density scores (25, 25, 22, and 14.286, respectively, ranked 13th, 13th, 9th, and 1st). These themes all present highly specialized features, but their overall relevance to the field is lower. However, ‘networks’, with its density ranking of first, may indicate its future potential as an emerging area. Furthermore, although ‘energy management’, ‘implementation’, and ‘network’ are all positioned in the emerging or declining theme quadrant, characterized by low centrality and frequency, their higher density rankings (7.5th, 2.5th, and 1st, respectively) suggest they may have further development potential in the future, warranting closer attention to their associations with other themes within the field.

Other themes such as ‘CO₂ emissions’ (centrality 0.16, ranked 11th, density 18.571, ranked 4th, frequency 12), ‘demand response’ (centrality 0.125, ranked 8th, density 22.917, ranked 10th, frequency 10), ‘compressive strength’ (centrality 0.128, ranked 9th, density 16.667, ranked 2.5th, frequency 6), and ‘electricity consumption’ (centrality 0.451, ranked 12th, density 19.984, ranked 6th, frequency 34), and ‘internet’ (centrality 0.15, ranked 10th, density 20, ranked 7.5th, frequency 5) exhibit mid-level centrality and density scores. This suggests that they maintain a degree of relevance within the field, with the potential for future development.

The analysis of keyword time trends, as shown in Figure 16, reveals that the median year of occurrence for ‘electricity consumption sector’ is 2024, possibly representing the latest research hotspot in the field. ‘Performance’, ‘optimization’, and ‘model’ show a median occurrence year of 2023, indicating that these themes are developing rapidly. In contrast, ‘power’ and ‘model-predictive control’ have a median year of 2022, implying that they are earlier themes that have already reached a relatively mature stage.

In summary, the research field, with ‘performance’ at its core, is demonstrating a trend toward greater refinement and specialization. ‘Electricity consumption sector’ and ‘network’ represent emerging areas of research, while ‘demand’， ‘optimization’, ‘model’, and ‘framework’ exhibit high developmental potential. Themes such as ‘energy management’ and ‘implementation’ remain in the earlier stages of development, warranting further attention and research. Overall, the research field is increasingly focusing on efficiency and electricity consumption.

3.4.2. Clustering by Coupling

“Clustering by Documents Coupling”, a bibliometric technique, was employed to identify clusters of the related literature within the research domain. The fundamental principle of this approach is that, if two or more documents commonly cite other documents in their reference lists, a ‘coupling’ relationship exists between them. The greater the coupling strength, the higher the correlation in terms of research themes and content [73]. Through cluster analysis, tightly coupled documents can be grouped into a cluster, thereby representing sub-themes or research directions within the field [74]. In essence, this method leverages the concept of ‘co-citation’ to identify document clusters that exhibit similar research approaches and content, which facilitates the discovery of distinct knowledge communities and their interrelationships. Typically, different colored clusters in the resulting visualization signify different research emphases. For instance, red clusters are commonly indicative of research hotspots, characterized by the largest scale and the highest impact, whereas blue clusters, when tightly coupled with red clusters, denote a research intersection where these fields share common knowledge bases or methodological approaches. Furthermore, green clusters are often smaller in scale and potentially represent emerging or niche research areas that merit further exploration. This systematic approach to the analysis of inter-document relationships provides a deeper understanding of the knowledge structure and evolutionary trends of the research domain.

Through the application of document coupling-based clustering, visualized in Figure 17 and complemented by the data in Table 9, a detailed analysis of the knowledge structure and dynamic trends within the research domain was conducted. Overall, the research field is centered on performance modeling and energy consumption optimization while also exhibiting a trend towards diversification.

The red core cluster, “performance—conf 52.3% model—conf 43.2% consumption—conf 60.4%”, positioned in the upper-right quadrant of the visualization, exhibits high frequency (187), impact (2.513), and confidence levels, clearly indicating its central importance and revealing the focus on using models to optimize energy consumption within the field. The red important cluster, “optimization—conf 29.1% buildings—conf 31.7% model—conf 16%”, also situated in the upper-right quadrant, albeit with a slightly lower frequency (83), demonstrates high impact (2.238), highlighting the critical role of optimization methods in building performance modeling, thus indicating a strong need for improving performance and efficiency through optimization. The prominence of these two red clusters underscores the core focus and primary research strengths within the domain.

The blue cluster, “performance—conf 34.9% model—conf 30.9% optimization—conf 38.2%”, positioned in the middle-upper region, possesses moderate frequency (132), centrality (0.213), and impact (2.081), indicating a research trend towards integrating performance, model, and optimization methods while also implying a research intersection and knowledge connection with the core red cluster.

Meanwhile, the green cluster, “behavior—conf 24% prediction—conf 8.5% algorithm—conf 20%”, located in the lower-left quadrant, and the green cluster, “model—conf 6.2% buildings—conf 9.8% comfort—conf 15%”, in the lower-right quadrant, represent emerging and peripheral research directions, respectively. The lower frequency, centrality, and impact levels of these clusters suggest that they are still in the early stages of development but may indicate future directions, with the former focusing on behavioral prediction and algorithms and the latter on comfort modeling in specific buildings.

In summary, the research field exhibits a focus on performance modeling and energy consumption, alongside diversification toward optimization methods, building performance, emerging algorithms, and comfort. This application of document coupling-based clustering not only clarifies the knowledge structure within the research field but also provides a solid basis for identifying potential avenues for future research and opportunities.

3.5. Application Perspective of the Study Area

3.5.1. Practical Applications of Machine Learning in Various Scenarios

Based on the results of bibliometric analysis, the core driving force in the research field has consistently focused on achieving sustainable building practices and energy efficiency goals, with an increasing emphasis on interdisciplinary collaboration between computer science and building science. This trend is particularly prominent in the application of machine learning, which has been widely adopted in the architectural domain. As technology progresses, the demand for algorithms and application scenarios continues to diversify. As previously mentioned, future research will focus on validating and expanding the application of these technologies in real-world building environments, especially addressing the practical challenges in energy management and architectural design. The diverse applications of machine learning, particularly in data processing, optimization, and control, have brought innovative solutions to the architectural field. Several new machine learning methods have gradually become popular choices in the field of architecture, demonstrating improved performance in areas such as data processing, design optimization, and system control. The following Figure 18 illustrates the key methods that have been widely adopted in recent years, along with their applications in this domain. Published studies are categorized by application scenarios, as summarized in Table 10.

In the realm of data processing and feature engineering, anomaly detection algorithms (e.g., Z-score, DBSCAN) excel in identifying outliers and enhancing data reliability. Data cleaning techniques (e.g., regression imputation, moving average) address issues related to missing values and noise, while feature selection and extraction algorithms (e.g., PCA, LDA) provide effective tools for model simplification and performance enhancement. For modeling and prediction, regression models (e.g., linear regression, random forest regression) are well suited for predicting building energy consumption and environmental parameters. Deep learning models (e.g., ANN, Transformer) excel in handling complex nonlinear relationships, while time-series models (e.g., LSTM) enable dynamic trend analysis.

According to the network analysis in the bibliometric results, an increasing number of studies focus on enhancing building performance and energy efficiency through advanced algorithms such as deep learning and reinforcement learning. In the domain of optimization and control, techniques like reinforcement learning, genetic algorithms, and Bayesian optimization provide multi-objective solutions for building design and energy management. These technologies not only address the complexity of building systems but also tackle the multifaceted challenges in energy management. For instance, the application of deep reinforcement learning in building design can autonomously adjust energy efficiency goals, while Bayesian optimization-based algorithms offer flexibility and efficiency in real-time energy regulation.

Moreover, with the emergence of new technologies, innovative methods such as federated learning and edge computing are pushing the boundaries of traditional algorithms. These technologies have unique advantages in data privacy protection and real-time responsiveness, aligning with the current research trend of closely integrating technology with practice. As indicated by the bibliometric analysis results, the research field is increasingly shifting toward high-performance and intelligent building systems, further driving the deep integration and application of smart technologies within the building industry.

Overall, the application of machine learning in architecture is showing a trend of diversification and continuous development. These technologies play a critical role not only in data processing and performance modeling but also in demonstrating immense potential in energy management and optimization. Combining the results of bibliometric analysis, future research should further focus on validating the applicability of these technologies in real-world building environments, exploring the application of new algorithms, particularly in innovative uses in energy management and building design. This research will promote the deep integration of machine learning within the architectural field, further enhancing building energy efficiency and sustainability.

3.5.2. Application Challenges and Technical Bottlenecks

The application of machine learning in the architectural domain spans diverse scenarios, each exhibiting unique advantages while also facing specific limitations.

In the domain of carbon emission calculation and optimization, the primary challenges include reliance on high-quality historical data and insufficient model generalization. Significant regional variations in carbon emission standards and calculation methods limit the applicability of models. Furthermore, difficulties in data acquisition and sharing exacerbate the complexity of model development. For energy-efficient design, the main obstacles are high model complexity and substantial initial deployment costs. Particularly in regional designs tailored to varying climatic conditions, models require customization, increasing data demands, and training costs. In smart energy management strategies, the use of machine learning faces challenges such as high requirements for data real-time availability, the complexity of system integration, and low user acceptance. For instance, embedding machine learning models into existing energy management systems often demands significant technical investment, while user habits in interacting with intelligent control systems may impact their effectiveness. In performance prediction and environmental quality monitoring, reliance on high-quality sensors poses challenges, as sensor failures or insufficient accuracy can directly affect monitoring and optimization outcomes. Additionally, the deployment costs of sensor networks, especially in large or multifunctional buildings, impose significant upfront investment pressures. For operation and maintenance (O&M) as well as fault diagnosis, significant discrepancies in interfaces and data formats across different devices hinder the development of unified standards, which becomes a major barrier to the large-scale applications of machine learning. The complexity of models imposes higher technical requirements on O&M personnel, potentially limiting the adoption of these technologies in traditional management practices.

Machine learning faces multifaceted challenges in data, modeling, application, and implementation, all of which hinder its widespread adoption. To address these challenges, we propose corresponding strategies, as summarized in Table 11.

3.5.3. Opportunities and Transformations with Emerging Technologies

With the continuous advancement of artificial intelligence, emerging technologies such as federated learning and edge computing have demonstrated significant application potential in the architectural domain. These technologies not only address the limitations of traditional machine learning approaches but also offer innovative solutions for achieving smarter and more sustainable building practices. Their advantages in data privacy protection, collaborative model optimization, and real-time responsiveness are gradually transforming key practices within the field.

Federated learning, known for its data privacy-preserving capabilities, introduces a novel approach to collaborative energy optimization across projects and buildings. It enables multiple buildings to locally train energy-saving models without sharing sensitive raw data. By exchanging only model parameters, federated learning can integrate diverse energy consumption characteristics and environmental conditions, facilitating the training of more generalizable models. For instance, in carbon emission prediction, federated learning can consolidate data from various buildings to enhance model accuracy and adaptability, providing robust support for developing personalized energy-saving strategies. During the design phase, federated learning fosters collaboration among different design teams, enabling the joint training of energy-efficient design models without the risk of data leakage. Furthermore, models based on federated learning can be tailored to specific building types, climate conditions, and user requirements, delivering highly customized energy-saving solutions.

Edge computing excels in improving the real-time performance and responsiveness of energy optimization. Decentralizing computational tasks to local edge devices within buildings enables the real-time monitoring and analysis of energy consumption. For example, edge computing can dynamically adjust the operational parameters of HVAC systems, lighting systems, and other equipment in response to environmental changes such as outdoor temperature or indoor illumination levels. This localized computation model not only reduces reliance on network connectivity but also significantly minimizes data transmission latency, enabling buildings to quickly respond to energy consumption fluctuations and make immediate adjustments to related systems, thereby effectively reducing energy usage. In terms of performance evaluation, edge computing allows architects to instantly obtain energy performance metrics for various design scenarios and optimize design iterations based on real-time feedback, accelerating the implementation of energy-efficient solutions. During the operational phase, edge computing enables the real-time monitoring of building systems, triggering maintenance alerts or adjusting operating parameters immediately upon detecting anomalies. This approach reduces energy losses and enhances the operational efficiency of equipment.

In summary, federated learning and edge computing offer robust technical support for energy optimization in buildings from the perspectives of collaborative data utilization and real-time computation. Federated learning enhances the efficiency and accuracy of model training through privacy-preserving data sharing, while edge computing improves the responsiveness and execution efficiency of energy-saving strategies through localized real-time processing. Together, these technologies are driving the architectural industry toward a greener and more intelligent future.

4. Discussion

Under the impetus of digital transformation, the construction industry is gradually advancing toward intelligence and data-driven development. This shift not only enhances the efficiency of building design and operations but also provides crucial technical support for achieving sustainable development goals [61,67]. This study systematically analyzes 496 documents using VOSviewer and Bibliometrix, revealing an exponential growth in machine learning research within the building energy efficiency sector, though academic influence exhibits regional disparities. Core countries (China, the United States, Australia) and research institutions such as Tsinghua University and Chongqing University dominate the high-yield author networks, while emerging technologies (such as unsupervised learning) occupy marginal positions in the keyword co-occurrence network, corroborating the issue of research fragmentation. This study innovatively highlights the multidimensional applications of machine learning in building energy efficiency optimization and its technological evolution. This research demonstrates how various machine learning techniques collaborate in building performance prediction, smart energy management, and sustainable design.

This study explores the critical role of machine learning in building performance prediction, energy management optimization, and sustainable design. First, the application of machine learning in building performance prediction, energy management optimization, and sustainable design is increasingly widespread, reflecting a trend of technological diversification and deepening development. In the area of building performance prediction, deep learning has become the dominant technology, with the keyword appearing 88 times, especially the long short-term memory (LSTM) network, which has a significant influence in this field, accounting for 40% of the top ten most-cited papers. However, despite deep learning’s exceptional performance in improving prediction accuracy, its model interpretability remains inadequate, and the “black-box” nature of complex algorithms has become a critical challenge for practical application and promotion. In the energy management field, reinforcement learning is widely applied in the dynamic control of heating, ventilation, and air conditioning (HVAC) systems, dominating the research, with 56% of the highly cited papers adopting reinforcement learning methods. These studies show that reinforcement learning has outstanding advantages in improving system energy efficiency and enabling intelligent regulation. However, existing research mainly focuses on theoretical model construction, and empirical evaluations of actual benefits are still lacking large-scale validation, with papers involving empirical studies accounting for only 17%, indicating that the practical deployment and large-scale promotion of reinforcement learning in building environments are insufficient. In sustainable design, the combination of generative design and Building Information Modeling (BIM) demonstrates significant application potential, with an annual growth rate of 35% for generative design-related studies, providing innovative pathways for green building and energy-efficient design. Although the development momentum in this field is significant, lifecycle assessment research remains fragmented, with a cluster density of 25 in the keyword co-occurrence network, suggesting that a systematic research framework from the lifecycle perspective has not yet been established, limiting its in-depth application throughout the building lifecycle.

This study summarizes the application trends of various machine learning techniques in intelligent building systems, revealing the multidimensional applications and technological evolution paths in the field. From the perspective of technological evolution, deep learning and reinforcement learning have become the core driving forces behind the development of intelligent building systems. Deep learning’s centrality in the network is 5.59, and reinforcement learning’s annual citation growth rate is 22%, confirming their central positions in building energy optimization. In contrast, research on unsupervised learning is still in the early exploration stage, and, although it has unique advantages in processing large-scale unlabeled data, its keyword frequency is only 12, failing to overcome the bottlenecks of practical application. At the same time, the trend of integrating machine learning technologies is gradually emerging, with federated learning and edge computing showing innovative potential in intelligent building systems, particularly in privacy protection and real-time response. Federated learning, relying on a distributed training model, effectively avoids the centralized storage of sensitive data, while edge computing improves the immediacy of data processing. However, these emerging technologies are still in the early stages of development, accounting for only 4.2% of the literature on building energy optimization, indicating that their practical application is still limited, requiring further research and integration with engineering practices.

Although significant progress has been made in the application of machine learning technologies in building energy optimization, several technical challenges remain. Data heterogeneity is considered the most prominent bottleneck, with inconsistent formats and varying quality of multisource data in complex building environments, severely impacting model training and cross-scenario applicability. Additionally, insufficient model interpretability has become a core barrier to further promoting deep learning and reinforcement learning, with the keyword “interpretability” having a centrality of only 0.04, indicating that scholarly attention to this issue remains limited. Furthermore, significant regional differences in standards impede the global promotion and application of the technology, with international cooperation accounting for only 27.02% of the literature, and insufficient data sharing and standard collaboration between regions, exacerbating the limitations of technology application.

It is noteworthy that the current research shows a significant technological imbalance, with 55% of the literature focusing on the development of predictive models, while only 8% addresses the actual deployment and engineering application of the technology, indicating a significant disconnect between academic research and industrial practice. This imbalance limits the comprehensive promotion of machine learning in building energy optimization, emphasizing the need for deeper integration of theory and practice. Future research should focus on developing hybrid frameworks that integrate physical models with machine learning, such as Physics-Informed Neural Networks (PINNs), to enhance model interpretability and address the “black-box” issue of deep learning models. Additionally, efforts should be made to promote the practical application of unsupervised learning in the building sector by establishing building energy benchmark datasets covering multiple scenarios to overcome data scarcity and labeling challenges, thereby enhancing model adaptability and robustness. Furthermore, greater collaboration across institutions and countries should be encouraged to establish a global research and application network, particularly facilitating collaborative innovation among core countries like China, the United States, and Australia, to drive the standardized deployment of machine learning models in the building industry and establish certification systems compliant with international standards, to promote global sharing and widespread application of the technology.

5. Conclusions

As research on smart buildings and carbon reduction technologies gains momentum, the volume of publications in this field continues to grow, reflecting the increasing attention from the academic community. However, there is a lack of systematic summaries regarding the application of emerging techniques such as deep learning, reinforcement learning, and unsupervised learning in recent years. Scientific bibliometric and visualization analyses in this domain remain relatively scarce. This study innovatively reveals the multidimensional applications and technological evolution of machine learning in building energy efficiency optimization through a systematic bibliometric analysis. In particular, this research demonstrates how the combined use of deep learning, reinforcement learning, and unsupervised learning can work synergistically in building performance prediction, smart energy management, and sustainable design, significantly enhancing energy efficiency and reducing carbon emissions. Meanwhile, this paper delves into key challenges in machine learning applications, especially bottlenecks in data quality and model adaptability, highlighting how data heterogeneity and model transferability limit the widespread application of these technologies.

Although the academic community has made rich contributions in this area, the gap between practical applications and academic research remains prominent. The high cost and complexity of the technologies have caused industry applications to lag behind research progress. Furthermore, the lack of unified evaluation standards and large-scale open-source datasets has limited the comparability and replicability of research. Based on the findings of this study, we recommend future research focus on the following aspects:

Strengthening interdisciplinary collaboration: Future work should promote deep collaboration across architecture, engineering, computer science, and energy science to develop customized machine learning models that address the challenges of different building types and climates. While the existing literature addresses the integration of machine learning with building systems, research on interdisciplinary integration is still scarce.
Expanding data sharing and benchmarking: To overcome data quality and model generalization issues, it is crucial to develop large open-source datasets and encourage sharing within the academic community. Establishing standardized building energy optimization benchmark datasets will facilitate more consistent and comparable research outcomes.
Improving model transparency and interpretability: Research should advance hybrid models that combine physics-based models with data-driven machine learning approaches, ensuring that machine learning systems in building energy management are transparent and interpretable, thus enhancing stakeholders’ trust in the decision-making process.
Developing cost-effective deployment strategies: Future research should focus on reducing the deployment costs of building machine learning systems and exploring technologies such as cloud computing, edge computing, and federated learning to reduce the need for large-scale data collection and centralized processing. Edge computing, though still in its early stages in building energy efficiency optimization, holds significant potential for improving real-time data processing and energy efficiency control.
Integrating smart IoT and energy systems: Future research should focus on the deep integration of smart IoT devices with building energy management systems, exploring the potential for IoT and machine learning to work synergistically for real-time energy optimization and carbon emission control, especially in the context of rapidly developing smart city infrastructure.
Addressing building lifecycle and sustainability issues: Research should cover the entire building lifecycle, with an emphasis on exploring how machine learning can promote sustainability across all stages, specifically, how machine learning can optimize material selection, reduce waste, and enhance a building’s ability to adapt to climate change.

By addressing these challenges and pursuing the suggested research directions, machine learning has the potential to not only optimize building energy performance but also facilitate the transition to low-carbon, smart buildings. As research in this field continues to develop, bridging the gap between theoretical advancements and practical, scalable solutions that can be implemented in real-world building projects is crucial.

This study provides a comprehensive overview of research in the field of building energy efficiency, offering valuable insights for both academia and industry. It also lays out clear directions for future research. Despite its efforts to achieve systematic and comprehensive coverage, this study acknowledges certain limitations. First, the analysis is based solely on publications indexed in the WoS database, potentially overlooking other significant research. Second, the dominance of quantitative bibliometric methods may constrain a more in-depth understanding of the literature’s content, potentially marginalizing studies on less central topics. Finally, the limitations of the retrieval strategy might result in the omission of relevant studies not containing specific keywords or inclusion of the literature with relatively dispersed themes.

Author Contributions

Methodology, J.C. and J.L.; software, J.L.; validation, J.L.; formal analysis, J.L.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, J.C.; visualization, J.L.; supervision, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in this article, and further inquiries can be directed to the corresponding author.

Acknowledgments

Thanks to the Institute of Creative and Research, School of Architecture, Harbin Institute of Technology, for supporting this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Aversa, P.; Donatelli, A.; Piccoli, G.; Luprano, V.A.M. Improved Thermal Transmittance Measurement with HFM Technique on Building Envelopes in the Mediterranean Area. Sel. Sci. Pap.—J. Civ. Eng. 2016, 11, 39–52. [Google Scholar] [CrossRef]
Shaamala, A.; Yigitcanlar, T.; Nili, A.; Nyandega, D. Algorithmic Green Infrastructure Optimisation: Review of Artificial Intelligence Driven Approaches for Tackling Climate Change. Sustain. Cities Soc. 2024, 101, 105182. [Google Scholar] [CrossRef]
Zhou, Y.; Zheng, S. A Co-Simulated Material-Component-System-District Framework for Climate-Adaption and Sustainability Transition. Renew. Sustain. Energy Rev. 2024, 192, 114184. [Google Scholar] [CrossRef]
Abdelrahman, M.; Zhan, S.; Miller, C.; Chong, A. Data Science for Building Energy Efficiency: A Comprehensive Text-Mining Driven Review of Scientific Literature. Energy Build. 2021, 242, 110885. [Google Scholar] [CrossRef]
Hong, T.; Wang, Z.; Luo, X.; Zhang, W. State-of-the-Art on Research and Applications of Machine Learning in the Building Life Cycle. Energy Build. 2020, 212, 109831. [Google Scholar] [CrossRef]
Li, Y.; Chen, H.; Yu, P.; Yang, L. The Application and Evaluation of the LMDI Method in Building Carbon Emissions Analysis: A Comprehensive Review. Buildings 2024, 14, 2820. [Google Scholar] [CrossRef]
Li, Z.; Ma, J.; Tan, Y.; Guo, C.; Li, X. Combining Physical Approaches with Deep Learning Techniques for Urban Building Energy Modeling: A Comprehensive Review and Future Research Prospects. Build. Environ. 2023, 246, 110960. [Google Scholar] [CrossRef]
Asghari, V.; Wang, Y.; Biglari, A.; Hsu, S.; Tang, P. Reinforcement Learning in Construction Engineering and Management: A Review. J. Constr. Eng. Manag. 2022, 148, 03122009. [Google Scholar] [CrossRef]
Fu, Q.; Han, Z.; Chen, J.; Lu, Y.; Wu, H.; Wang, Y. Applications of Reinforcement Learning for Building Energy Efficiency Control: A Review. J. Build. Eng. 2022, 50, 104165. [Google Scholar] [CrossRef]
Weinberg, D.; Wang, Q.; Timoudas, T.; Fischione, C. A Review of Reinforcement Learning for Controlling Building Energy Systems From a Computer Science Perspective. Sustain. Cities Soc. 2023, 89, 104351. [Google Scholar] [CrossRef]
Yu, H.; Tam, V.; Xu, X. A Systematic Review of Reinforcement Learning Application in Building Energy-Related Occupant Behavior Simulation. Energy Build. 2024, 312, 114189. [Google Scholar] [CrossRef]
Hasan, Z.; Roy, N. Trending Machine Learning Models in Cyber-Physical Building Environment: A Survey. Wiley Interdiscip. Rev.-Data Min. Knowl. Discov. 2021, 11, e1422. [Google Scholar] [CrossRef]
Um-e-Habiba; Ahmed, I.; Asif, M.; Alhelou, H.; Khalid, M. A Review on Enhancing Energy Efficiency and Adaptability through System Integration for Smart Buildings. J. Build. Eng. 2024, 89, 109354. [Google Scholar] [CrossRef]
Ghahramani, A.; Galicia, P.; Lehrer, D.; Varghese, Z.; Wang, Z.; Pandit, Y. Artificial Intelligence for Efficient Thermal Comfort Systems: Requirements, Current Applications and Future Directions. Front. Built Environ. 2020, 6, 49. [Google Scholar] [CrossRef]
Balali, Y.; Chong, A.; Busch, A.; O’Keefe, S. Energy Modelling and Control of Building Heating and Cooling Systems with Data-Driven and Hybrid Models—A Review. Renew. Sustain. Energy Rev. 2023, 183, 113496. [Google Scholar] [CrossRef]
Elwy, I.; Hagishima, A. The Artificial Intelligence Reformation of Sustainable Building Design Approach: A Systematic Review on Building Design Optimization Methods Using Surrogate Models. Energy Build. 2024, 323, 114769. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, J. Advances in Emerging Digital Technologies for Energy Efficiency and Energy Integration in Smart Cities. Energy Build. 2024, 315, 114289. [Google Scholar] [CrossRef]
Abdel-Jaber, F.; Dirks, K. A Review of Cooling and Heating Loads Predictions of Residential Buildings Using Data-Driven Techniques. Buildings 2024, 14, 752. [Google Scholar] [CrossRef]
Ayoub, M. A Review on Machine Learning Algorithms to Predict Daylighting inside Buildings. Sol. Energy 2020, 202, 249–275. [Google Scholar] [CrossRef]
Zhang, W.; Wu, Y.; Calautit, J. A Review on Occupancy Prediction through Machine Learning for Enhancing Energy Efficiency, Air Quality and Thermal Comfort in the Built Environment. Renew. Sustain. Energy Rev. 2022, 167, 112704. [Google Scholar] [CrossRef]
Michailidis, P.; Michailidis, I.; Vamvakas, D.; Kosmatopoulos, E. Model-Free HVAC Control in Buildings: A Review. Energies 2023, 16, 7124. [Google Scholar] [CrossRef]
Vosoughkhosravi, S.; Jafari, A. Creating a Large-Scale National Residential Building Energy Dataset Using a Two-Stage Machine Learning Approach; Shane, J., Madson, K., Mo, Y., Poleacovschi, C., Sturgill, R., Eds.; ASCM: Reston, VA, USA, 2024; pp. 305–315. [Google Scholar]
Asif, M.; Naeem, G.; Khalid, M. Digitalization for Sustainable Buildings: Technologies, Applications, Potential, and Challenges. J. Clean. Prod. 2024, 450, 141814. [Google Scholar] [CrossRef]
Suphavarophas, P.; Wongmahasiri, R.; Keonil, N.; Bunyarittikit, S. A Systematic Review of Applications of Generative Design Methods for Energy Efficiency in Buildings. Buildings 2024, 14, 1311. [Google Scholar] [CrossRef]
Adhikari, R.; Gautam, Y.; Jebelli, H.; Sitzabee, W. Deep Learning and Reinforcement Learning for Modeling Occupants’ Information in an Occupant-Centric Building Control: A Systematic Literature Review. In Construction Research Congress 2024; Shane, J., Madson, K., Mo, Y., Poleacovschi, C., Sturgill, R., Eds.; ASCM: Reston, VA, USA, 2024; pp. 186–195. [Google Scholar]
Manfren, M.; Gonzalez-Carreon, K.; James, P. Interpretable Data-Driven Methods for Building Energy Modelling-A Review of Critical Connections and Gaps. Energies 2024, 17, 881. [Google Scholar] [CrossRef]
Olu-Ajayi, R.; Alaka, H.; Sunmola, F.; Ajayi, S.; Mporas, I. Statistical and Artificial Intelligence-Based Tools for Building Energy Prediction: A Systematic Literature Review. IEEE Trans. Eng. Manag. 2024, 71, 14733–14753. [Google Scholar] [CrossRef]
Sun, K.; Zhao, Q.; Zou, J. A Review of Building Occupancy Measurement Systems. Energy Build. 2020, 216, 109965. [Google Scholar] [CrossRef]
Wang, M.; Jia, Z.; Tao, L.; Xiang, C. Review of Dynamic Façade Typologies, Physical Performance and Control Methods: Towards Smarter and Cleaner Zero-Energy Buildings. J. Build. Eng. 2024, 98, 111310. [Google Scholar] [CrossRef]
Mondal, N.; Anand, P.; Khan, A.; Deb, C.; Cheong, D.; Sekhar, C.; Niyogi, D.; Santamouris, M. Systematic Review of the Efficacy of Data-Driven Urban Building Energy Models during Extreme Heat in Cities: Current Trends and Future Outlook. Build. Simul. 2024, 17, 695–722. [Google Scholar] [CrossRef]
Fu, H.; Baltazar, J.; Claridge, D. Review of Developments in Whole-Building Statistical Energy Consumption Models for Commercial Buildings. Renew. Sustain. Energy Rev. 2021, 147, 111248. [Google Scholar] [CrossRef]
Zhu, X.; Zhang, X.; Gong, P.; Li, Y. A review of distributed energy system optimization for building decarbonization. J. Build. Eng. 2023, 73, 106735. [Google Scholar] [CrossRef]
Bellagarda, A.; Cesari, S.; Aliberti, A.; Ugliotti, F.; Bottaccioli, L.; Macii, E.; Patti, E. Effectiveness of Neural Networks and Transfer Learning for Indoor Air-Temperature Forecasting. Autom. Constr. 2022, 140, 104314. [Google Scholar] [CrossRef]
Grant, M.J.; Booth, A. A Typology of Reviews: An Analysis of 14 Review Types and Associated Methodologies. Health Inf. Libr. J. 2009, 26, 91–108. [Google Scholar] [CrossRef]
Lempel, R.; Moran, S. SALSA: The Stochastic Approach for Link-Structure Analysis. ACM Trans. Inf. Syst. 2001, 19, 131–160. [Google Scholar] [CrossRef]
Systematic Approaches to a Successful Literature Review; SAGE Publications Ltd.: Thousand Oaks, CA, USA; Available online: https://uk.sagepub.com/en-gb/eur/systematic-approaches-to-a-successful-literature-review/book270933 (accessed on 13 January 2025).
Harzing, A.-W.; Alakangas, S. Google Scholar, Scopus and the Web of Science: A Longitudinal and Cross-Disciplinary Comparison. Scientometrics 2016, 106, 787–804. [Google Scholar] [CrossRef]
Liu, W. Caveats for the Use of Web of Science Core Collection in Old Literature Retrieval and Historical Bibliometric Analysis. Technol. Forecast. Soc. Chang. 2021, 172, 121023. [Google Scholar] [CrossRef]
Moral-Muñoz, J.A.; Herrera-Viedma, E.; Santisteban-Espejo, A.; Cobo, M.J. Software Tools for Conducting Bibliometric Analysis in Science: An up-to-Date Review. Prof. Inf. 2020, 29, e290103. [Google Scholar] [CrossRef]
Leydesdorff, L.; Carley, S.; Rafols, I. Global Maps of Science Based on the New Web-of-Science Categories. Scientometrics 2013, 94, 589–593. [Google Scholar] [CrossRef]
Velez-Estevez, A.; García-Sánchez, P.; Moral-Munoz, J.A.; Cobo, M.J. Why Do Papers from International Collaborations Get More Citations? A Bibliometric Analysis of Library and Information Science Papers. Scientometrics 2022, 127, 7517–7555. [Google Scholar] [CrossRef]
Ibáñez, A.; Bielza, C.; Larrañaga, P. Relationship among Research Collaboration, Number of Documents and Number of Citations: A Case Study in Spanish Computer Science Production in 2000–2009. Scientometrics 2013, 95, 689–716. [Google Scholar] [CrossRef]
Nicholls, P.T. Bibliometric Modeling Processes and the Empirical Validity of Lotka’s Law. J. Am. Soc. Inf. Sci. 1989, 40, 379–385. [Google Scholar] [CrossRef]
Newby, G.B.; Greenberg, J.; Jones, P. Open Source Software Development and Lotka’s Law: Bibliometric Patterns in Programming. J. Am. Soc. Inf. Sci. Technol. 2003, 54, 169–178. [Google Scholar] [CrossRef]
Patra, S.K.; Bhattacharya, P.; Verma, N. Bibliometric Study of Literature on Bibliometrics. DESIDOC J. Libr. Inf. Technol. 2006, 26, 27–32. [Google Scholar] [CrossRef]
Alabi, G. Bradford’s Law and Its Application. Int. Libr. Rev. 1979, 11, 151–158. [Google Scholar] [CrossRef]
Nash-Stewart, C.E.; Kruesi, L.M.; Del Mar, C.B. Does Bradford’s Law of Scattering Predict the Size of the Literature in Cochrane Reviews? J. Med. Libr. Assoc. 2012, 100, 135–138. [Google Scholar] [CrossRef]
Olu-Ajayi, R.; Alaka, H.; Sulaimon, I.; Sunmola, F.; Ajayi, S. Building Energy Consumption Prediction for Residential Buildings Using Deep Learning and Other Machine Learning Techniques. J. Build. Eng. 2022, 45, 103406. [Google Scholar] [CrossRef]
Brandi, S.; Piscitelli, M.; Martellacci, M.; Capozzoli, A. Deep Reinforcement Learning to Optimise Indoor Temperature Control and Heating Energy Consumption in Buildings. Energy Build. 2020, 224, 110225. [Google Scholar] [CrossRef]
Xie, J.; Li, H.; Li, C.; Zhang, J.; Luo, M. Review on Occupant-Centric Thermal Comfort Sensing, Predicting, and Controlling. Energy Build. 2020, 226, 110392. [Google Scholar] [CrossRef]
Gopinath, R.; Kumar, M.; Prakash Chandra Joshua, C.; Srinivas, K. Energy Management Using Non-Intrusive Load Monitoring Techniques—State-of-the-Art and Future Research Directions. Sustain. Cities Soc. 2020, 62, 102411. [Google Scholar] [CrossRef]
Dong, Z.; Liu, J.; Liu, B.; Li, K.; Li, X. Hourly Energy Consumption Prediction of an Office Building Based on Ensemble Learning and Energy Consumption Pattern Classification. Energy Build. 2021, 241, 110929. [Google Scholar] [CrossRef]
Hosamo, H.; Svennevig, P.; Svidt, K.; Han, D.; Nielsen, H. A Digital Twin Predictive Maintenance Framework of Air Handling Units Based on Automatic Fault Detection and Diagnostics. Energy Build. 2022, 261, 111988. [Google Scholar] [CrossRef]
Seyedzadeh, S.; Rahimian, F.; Oliver, S.; Rodriguez, S.; Glesk, I. Machine Learning Modelling for Predicting Non-Domestic Buildings Energy Performance: A Model to Support Deep Energy Retrofit Decision-Making. Appl. Energy 2020, 279, 115908. [Google Scholar] [CrossRef]
Mounir, N.; Ouadi, H.; Jrhilifa, I. Short-Term Electric Load Forecasting Using an EMD-BI-LSTM Approach for Smart Grid Energy Management System. Energy Build. 2023, 288, 113022. [Google Scholar] [CrossRef]
Börner, K.; Chen, C.; Boyack, K.W. Visualizing Knowledge Domains. Annu. Rev. Inf. Sci. Technol. 2003, 37, 179–255. [Google Scholar] [CrossRef]
Han, Z.; Peng, K.; Mi, J.; Li, B. The Smart City: A New Solution to Urban Shrinkage? Evidence from China. J. Asian Public Policy 2024, 17, 160–179. [Google Scholar] [CrossRef]
Yu, C.; Yu, J.; Gao, D. Smart Cities and Greener Futures: Evidence from a Quasi-Natural Experiment in China’s Smart City Construction. Sustainability 2024, 16, 929. [Google Scholar] [CrossRef]
Bijlani, V. Smart Buildings for Sustainable Smart Cities. In Proceedings of the 2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC), Jeddah, Saudi Arabia, 23–25 January 2023; pp. 1–6. [Google Scholar]
Gedikli, A.; Taş, C.Y.; Taş, N.B. Redefining Smart Cities, Urban Energy, and Green Technologies for Sustainable Development. In Handbook of Research on Sustainable Development Goals, Climate Change, and Digitalization; IGI Global Scientific Publishing: Hershey, PA, USA, 2022; pp. 216–232. ISBN 978-1-7998-8482-8. [Google Scholar]
Qiu, J.-P.; Dong, K.; Yu, H.-Q. Comparative Study on Structure and Correlation among Author Co-Occurrence Networks in Bibliometrics. Scientometrics 2014, 101, 1345–1360. [Google Scholar] [CrossRef]
Amasyali, K.; El-Gohary, N.M. A Review of Data-Driven Building Energy Consumption Prediction Studies. Renew. Sustain. Energy Rev. 2018, 81, 1192–1205. [Google Scholar] [CrossRef]
Pérez-Lombard, L.; Ortiz, J.; Pout, C. A Review on Buildings Energy Consumption Information. Energy Build. 2008, 40, 394–398. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018; pp. xxii, 526. ISBN 978-0-262-03924-6. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Small, H. Co-Citation in the Scientific Literature: A New Measure of the Relationship between Two Documents. J. Am. Soc. Inf. Sci. 1973, 24, 265–269. [Google Scholar] [CrossRef]
Griffith, B.C.; Small, H.G.; Stonehill, J.A.; Dey, S. The Structure of Scientific Literatures II: Toward a Macro- and Microstructure for Science. Sci. Stud. 1974, 4, 339–365. [Google Scholar] [CrossRef]
Bui, D.; Nguyen, T.; Ngo, T.; Nguyen-Xuan, H. An Artificial Neural Network (ANN) Expert System Enhanced with the Electromagnetism-Based Firefly Algorithm (EFA) for Predicting the Energy Consumption in Buildings. Energy 2020, 190, 116370. [Google Scholar] [CrossRef]
Aria, M.; Cuccurullo, C.; D’Aniello, L.; Misuraca, M.; Spano, M. Thematic Analysis as a New Culturomic Tool: The Social Media Coverage on COVID-19 Pandemic in Italy. Sustainability 2022, 14, 3643. [Google Scholar] [CrossRef]
Wilczewski, M.; Alon, I. Language and Communication in International Students’ Adaptation: A Bibliometric and Content Analysis Review. Hig. Edu. 2023, 85, 1235–1256. [Google Scholar] [CrossRef]
Herrera, J.; De Las Heras-Rosas, C. The Organizational Commitment in the Company and Its Relationship With the Psychological Contract. Front. Psychol. 2021, 11, 3978. [Google Scholar] [CrossRef]
Salton, G.; McGill, M.J. Introduction to Modern Information Retrieval; McGraw-Hill, Inc.: New York, NY, USA, 1986; ISBN 978-0-07-054484-0. [Google Scholar]
Smeaton, A.F. Using NLP or NLP Resources for Information Retrieval Tasks. In Natural Language Information Retrieval; Strzalkowski, T., Ed.; Springer Netherlands: Dordrecht, The Netherlands, 1999; pp. 99–111. ISBN 978-94-017-2388-6. [Google Scholar]
Han, Y.; Li, J.; Lou, X.; Fan, C.; Geng, Z. Energy Saving of Buildings for Reducing Carbon Dioxide Emissions Using Novel Dendrite Net Integrated Adaptive Mean Square Gradient. Appl. Energy 2022, 309, 118409. [Google Scholar] [CrossRef]
Li, G.; Wu, Y.; Liu, J.; Fang, X.; Wang, Z. Performance Evaluation of Short-Term Cross-Building Energy Predictions Using Deep Transfer Learning Strategies. Energy Build. 2022, 275, 112461. [Google Scholar] [CrossRef]
Pan, Y.; Zhang, L. Data-Driven Estimation of Building Energy Consumption with Multi-Source Heterogeneous Data. Appl. Energy 2020, 268, 114965. [Google Scholar] [CrossRef]
Verma, A.; Prakash, S.; Kumar, A. ANN-Based Energy Consumption Prediction Model up to 2050 for a Residential Building: Towards Sustainable Decision Making. Environ. Prog. Sustain. Energy 2021, 40, e13544. [Google Scholar] [CrossRef]
Piras, G.; Muzi, F.; Ziran, Z. Open Tool for Automated Development of Renewable Energy Communities: Artificial Intelligence and Machine Learning Techniques for Methodological Approach. Energies 2024, 17, 5726. [Google Scholar] [CrossRef]
Venkatraj, V.; Dixit, M.; Yan, W.; Caffey, S.; Sideris, P.; Aryal, A. Toward the Application of a Machine Learning Framework for Building Life Cycle Energy Assessment. Energy Build. 2023, 297, 113444. [Google Scholar] [CrossRef]
Zhang, X.; Chen, H.; Sun, J.; Zhang, X. Predictive Models of Embodied Carbon Emissions in Building Design Phases: Machine Learning Approaches Based on Residential Buildings in China. Build. Environ. 2024, 258, 111595. [Google Scholar] [CrossRef]
Zheng, L.; Mueller, M.; Luo, C.; Yan, X. Predicting Whole-Life Carbon Emissions for Buildings Using Different Machine Learning Algorithms: A Case Study on Typical Residential Properties in Cornwall, UK. Appl. Energy 2024, 357, 122472. [Google Scholar] [CrossRef]
Li, Z.; Zhao, Y.; Xia, H.; Xie, S. A Multi-Objective Optimization Framework for Building Performance under Climate Change. J. Build. Eng. 2023, 80, 107978. [Google Scholar] [CrossRef]
Zhang, C.; Tian, X.; Zhao, Y.; Lu, J. Automated Machine Learning-Based Building Energy Load Prediction Method. J. Build. Eng. 2023, 80, 108071. [Google Scholar] [CrossRef]
Zhang, H.; Feng, H.; Hewage, K.; Arashpour, M. Artificial Neural Network for Predicting Building Energy Performance: A Surrogate Energy Retrofits Decision Support Framework. Buildings 2022, 12, 829. [Google Scholar] [CrossRef]
Abdou, N.; El Mghouchi, Y.; Jraida, K.; Hamdaoui, S.; Hajou, A.; Mouqallid, M. Prediction and Optimization of Heating and Cooling Loads for Low Energy Buildings in Morocco: An Application of Hybrid Machine Learning Methods. J. Build. Eng. 2022, 61, 105332. [Google Scholar] [CrossRef]
Biloria, N.; Makki, M.; Abdollahzadeh, N. Multi-Performative Façade Systems: The Case of Real-Time Adaptive BIPV Shading Systems to Enhance Energy Generation Potential and Visual Comfort. Front. Built Environ. 2023, 9, 1119696. [Google Scholar] [CrossRef]
Pan, H.; Wu, C. Bayesian Optimization + XGBoost Based Life Cycle Carbon Emission Prediction for Residential Buildings—An Example from Chengdu, China. Build. Simul. 2023, 16, 1451–1466. [Google Scholar] [CrossRef]
Salami, B.; Abba, S.; Adewumi, A.; Dodo, U.; Otukogbe, G.; Oyedele, L. Building Energy Loads Prediction Using Bayesian-Based Metaheuristic Optimized-Explainable Tree-Based Model. Case Stud. Constr. Mater. 2023, 19, e02676. [Google Scholar] [CrossRef]
Wang, H.; Wen, W.; Zhang, Z.; Gao, N. Construction of Building Energy Consumption Prediction Model Based on Multi-Optimization Model. Buildings 2023, 13, 1677. [Google Scholar] [CrossRef]
Bhamare, D.; Saikia, P.; Rathod, M.; Rakshit, D.; Banerjee, J. A Machine Learning and Deep Learning Based Approach to Predict the Thermal Performance of Phase Change Material Integrated Building Envelope. Build. Environ. 2021, 199, 107927. [Google Scholar] [CrossRef]
Ly, H.-B.; Nguyen, M.H.; Pham, B.T. Metaheuristic Optimization of Levenberg–Marquardt-Based Artificial Neural Network Using Particle Swarm Optimization for Prediction of Foamed Concrete Compressive Strength. Neural Compu. Appl. 2021, 33, 17331–17351. [Google Scholar] [CrossRef]
Rezaie, M.; Kariminia, S.; Band, S.; Ameri, R.; Farokhi, M.; Pai, H.; Gocer, O.; Rismanchi, B.; Shooshtarian, S. Energy Consumption of High-Rise Double Skin Façade Buildings, a Machine Learning Analysis. J. Build. Eng. 2024, 89, 109230. [Google Scholar] [CrossRef]
Dai, X.; Cheng, S.; Chong, A. Deciphering Optimal Mixed-Mode Ventilation in the Tropics Using Reinforcement Learning with Explainable Artificial Intelligence. Energy Build. 2023, 278, 112629. [Google Scholar] [CrossRef]
Norouzi, P.; Maalej, S.; Mora, R. Applicability of Deep Learning Algorithms for Predicting Indoor Temperatures: Towards the Development of Digital Twin HVAC Systems. Buildings 2023, 13, 1542. [Google Scholar] [CrossRef]
Hosamo, H.; Nielsen, H.; Kraniotis, D.; Svennevig, P.; Svidt, K. Improving Building Occupant Comfort through a Digital Twin Approach: A Bayesian Network Model and Predictive Maintenance Method. Energy Build. 2023, 288, 112992. [Google Scholar]
Deng, J.; Eklund, M.; Sierla, S.; Savolainen, J.; Niemistö, H.; Karhela, T.; Vyatkin, V. Deep Reinforcement Learning for Fuel Cost Optimization in District Heating. Sustain. Cities Soc. 2023, 99, 104955. [Google Scholar] [CrossRef]
Fang, Z.; Crimier, N.; Scanu, L.; Midelet, A.; Alyafi, A.; Delinchant, B. Multi-Zone Indoor Temperature Prediction with LSTM-Based Sequence to Sequence Model☆. Energy Build. 2021, 245, 111053. [Google Scholar] [CrossRef]
Chen, R.; Tsay, Y. Carbon Emission and Thermal Comfort Prediction Model for an Office Building Considering the Contribution Rate of Design Parameters. Energy Rep. 2022, 8, 8093–8107. [Google Scholar] [CrossRef]
Jia, T.; He, W.; Ma, W. Optimizing Urban Energy Management: A Strategic Examination of Smart Grids and Policy Regulations. Sustain. Cities Soc. 2024, 106, 105379. [Google Scholar] [CrossRef]
Hong, G.; Choi, G.; Eum, J.; Lee, H.; Kim, D. The Hourly Energy Consumption Prediction by KNN for Buildings in Community Buildings. Buildings 2022, 12, 1636. [Google Scholar] [CrossRef]
Blad, C.; Bogh, S.; Kallesoe, C. Data-Driven Offline Reinforcement Learning for HVAC-Systems. Energy 2022, 261, 125290. [Google Scholar] [CrossRef]
Lin, X.; Guo, Q.; Yuan, D.; Gao, M. Bayesian Optimization Framework for HVAC System Control. Buildings 2023, 13, 314. [Google Scholar] [CrossRef]
Li, W.; Zhao, Y.; Zhang, J.; Jiang, C.; Chen, S.; Lin, L.; Wang, Y. Indoor Temperature Preference Setting Control Method for Thermal Comfort and Energy Saving Based on Reinforcement Learning. J. Build. Eng. 2023, 73, 106805. [Google Scholar] [CrossRef]
Chiosa, R.; Piscitelli, M.; Fan, C.; Capozzoli, A. Towards a Self-Tuned Data Analytics-Based Process for an Automatic Context-Aware Detection and Diagnosis of Anomalies in Building Energy Consumption Timeseries. Energy Build. 2022, 270, 112302. [Google Scholar] [CrossRef]
Zhao, Y.; Yang, Z.; Zhu, J.; Hou, Z.; Zhang, S.; Hu, Y.; Shu, Y. Research on the Dynamic Characterization and Detection of Refrigerant Leakage in Multi-Connected Air-Conditioning System. Energy Build. 2024, 309, 114076. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, X.; Sun, Y.; Zhou, Y. Advanced Controls on Energy Reliability, Flexibility and Occupant-Centric Control for Smart and Energy-Efficient Buildings. Energy Build. 2023, 297, 113436. [Google Scholar] [CrossRef]
Chen, S.; Ding, P.; Zhou, G.; Zhou, X.; Li, J.; Wang, L.; Wu, H.; Fan, C.; Li, J. A Novel Machine Learning-Based Model Predictive Control Framework for Improving the Energy Efficiency of Air-Conditioning Systems. Energy Build. 2023, 294, 113258. [Google Scholar] [CrossRef]
Erisen, S. A Systematic Approach to Optimizing Energy-Efficient Automated Systems with Learning Models for Thermal Comfort Control in Indoor Spaces. Buildings 2023, 13, 1824. [Google Scholar] [CrossRef]
Ding, Z.; Fu, Q.; Chen, J.; Lu, Y.; Wu, H.; Fang, N.; Xing, B. MAQMC: Multi-Agent Deep Q-Network for Multi-Zone Residential HVAC Control. Comput. Model. Eng. Sci. 2023, 136, 2759–2785. [Google Scholar] [CrossRef]
Omidvar, A.; Kim, J. A Novel Theoretical Model for Predicting the Individuals’ Thermal Sensations Based on Air Temperature and Biomarkers Measured by Wearable Devices. Build. Environ. 2023, 232, 110050. [Google Scholar] [CrossRef]
Deng, Q.; Chen, Z.; Zhu, W.; Li, Z.; Yuan, Y.; Wang, Y. Adaptive Fusion Graph Convolutional Network Based Interpretable Fault Diagnosis Method for HVAC Systems Enhanced by Unlabeled Data. Energy Build. 2024, 324, 114901. [Google Scholar] [CrossRef]
Ren, H.; Xu, C.; Lyu, Y.; Ma, Z.; Sun, Y. A Thermodynamic-Law-Integrated Deep Learning Method for High-Dimensional Sensor Fault Detection in Diverse Complex HVAC Systems. Appl. Energy 2023, 351, 121830. [Google Scholar] [CrossRef]
Faridah, F.; Utami, S.; Wijaya, D.; Yanti, R.; Putra, W.; Adrian, B. An Indoor Airflow Distribution Predictor Using Machine Learning for a Real-Time Healthy Building Monitoring System in the Tropics. Build. Serv. Eng. Res. Technol. 2024, 45, 293–315. [Google Scholar] [CrossRef]
Liao, J.; Yang, D.; Arshad, N.; Venkatachalam, K.; Ahmadian, A. MEMS: An Automated Multi-Energy Management System for Smart Residences Using the DD-LSTM Approach. Sustain. Cities Soc. 2023, 98, 104850. [Google Scholar] [CrossRef]
Quevedo, T.; Geraldi, M.; Melo, A. Applying Machine Learning to Develop Energy Benchmarking for University Buildings in Brazil. J. Build. Eng. 2023, 63, 105468. [Google Scholar] [CrossRef]
Li, J.; Zhang, C.; Zhao, Y.; Qiu, W.; Chen, Q.; Zhang, X. Federated Learning-Based Short-Term Building Energy Consumption Prediction Method for Solving the Data Silos Problem. Build. Simul. 2022, 15, 1145–1159. [Google Scholar] [CrossRef]
Dridi, J.; Amayri, M.; Bouguila, N. Unsupervised Domain Adaptation with and without Access to Source Data for Estimating Occupancy and Recognizing Activities in Smart Buildings. Build. Environ. 2023, 243, 110651. [Google Scholar] [CrossRef]
Papachatzis, K. Machine Learning-Based Price Prediction for Thermal Insulation Materials: A Holistic Approach Integrating Thermophysical, Technical, and Environmental Attributes in the Greek Construction Market. Energy Build. 2024, 324, 114899. [Google Scholar]
Cui, H.; Zhang, L.; Yang, H.; Shi, Y. Optimizing Thermal Comfort and Energy Efficiency in Hospitals with PCM-Enhanced Wall Systems. Energy Build. 2024, 323, 114740. [Google Scholar] [CrossRef]
Jun, H.; Fei, H. Research on Multi-Objective Optimization of Building Energy Efficiency Based on Energy Consumption and Thermal Comfort. Build. Serv. Eng. Res. Technol. 2024, 45, 391–411. [Google Scholar] [CrossRef]
Shen, Y.; Pan, Y. BIM-Supported Automatic Energy Performance Analysis for Green Building Design Using Explainable Machine Learning and Multi-Objective Optimization. Appl. Energy 2023, 333, 120575. [Google Scholar] [CrossRef]
Liu, Z.; Zhong, J.; Liu, Y.; Liang, Y.; Li, Z. Dynamic Simulation of Street-Level Carbon Emissions in Megacities: A Case Study of Wuhan City, China (2015–2030). Sustain. Cities Soc. 2024, 115, 105853. [Google Scholar] [CrossRef]
Mo, Y.; Zhao, D. Effective Factors for Residential Building Energy Modeling Using Feature Engineering. J. Build. Eng. 2021, 44, 102891. [Google Scholar] [CrossRef]
Zheng, W.; Wang, D.; Wang, Z. Economic Model Predictive Control for Building HVAC System: A Comparative Analysis of Model-Based and Data-Driven Approaches Using the BOPTEST Framework. Appl. Energy 2024, 374, 123969. [Google Scholar] [CrossRef]
Excell, L.E.; Andrews, A.; Jain, R.K. E-Audit: A “No-Touch” Energy Audit That Integrates Machine Learning and Simulation. Energy Build. 2024, 317, 114360. [Google Scholar] [CrossRef]
Dridi, J.; Amayri, M.; Bouguila, N. Unsupervised Clustering-Based Domain Adaptation for Estimating Occupancy and Recognizing Activities in Smart Buildings. J. Build. Eng. 2024, 85, 108741. [Google Scholar] [CrossRef]
Pinto, G.; Messina, R.; Li, H.; Hong, T.; Piscitelli, M.; Capozzoli, A. Sharing Is Caring: An Extensive Analysis of Parameter-Based Transfer Learning for the Prediction of Building Thermal Dynamics. Energy Build. 2022, 276, 112530. [Google Scholar] [CrossRef]

Figure 1. Methodological scheme for this research.

Figure 2. Yearly scientific production and average citations.

Figure 3. Lotka’s law for authors’ productivity.

Figure 4. Authors’ production over time performed with Biblioshiny version 4.0.

Figure 5. Source clustering according to Bradford’s law performed with Biblioshiny version 4.0.

Figure 6. Three-field plot showing sources, keywords, and countries from left to right.

Figure 7. Global scientific output in the subject area performed with Biblioshiny version 4.0.

Figure 8. National collaboration network.

Figure 9. Institutional collaboration network.

Figure 10. Affiliations’ production over time.

Figure 11. Author co-citation network.

Figure 12. Journal Co-Citation Network.

Figure 13. Collaborative contributions to publications.

Figure 14. Keyword co-occurrence network.

Figure 15. Thematic map analysis performed with Biblioshiny version 4.0.

Figure 16. Trend topics.

Figure 17. Clustering by coupling performed with Biblioshiny version 4.0.

Figure 18. Emerging machine learning technologies and applications.

Table 1. Overview of recent review articles on related topics.

Topic	Comments	Key Content Summary
Machine Learning in Building Energy Optimization	Focus on dynamic control (e.g., HVAC, energy systems) and generative design, emphasizing the role of algorithms in real-time decision making (e.g., RL).	- Reinforcement learning (RL) in HVAC control [18,21,25] and dynamic optimization of energy system integration [4,13]. - Generative design methods [19] and hybrid models [15] to improve efficiency. - RL’s potential for energy conservation in construction management [29].
Machine Learning for Comfort and Control in the Built Environment	Coverage of user behavior simulation and environmental parameter prediction, embodying “human-centric” intelligent regulation.	- Deep learning and RL combined for user behavior modeling [7] and thermal comfort control [16,27]. - Occupancy prediction [8,26,30] and real-time indoor temperature prediction [31] for optimizing environmental regulation.
Machine Learning and Building Design and Modeling	From physical structure optimization to system-level building modeling, combining mature AI with hybrid modeling methods.	- Dynamic facades [32] and AI-driven optimization of generative design [19]. - AI modeling of building envelopes [9] and distributed energy systems [4]. - Hybrid physics-data models [33] for daylighting prediction [12].
Data-Driven Methods in the Built Environment	Emphasis on data acquisition (e.g., large-scale datasets) and model interpretability, addressing data scarcity and trust issues.	- Large-scale dataset construction [10] and statistical machine learning hybrid methods [22,24]. - Data-driven models under extreme weather conditions [11] and energy consumption analysis driven by text mining [5]. - Model interpretability [17].

Table 2. Characteristics of the selected publications.

Description	Results
Timespan	2020–2024
Sources (journals, books, etc.)	73
Documents	496
Annual growth rate % ¹	98.85
Document average age	1.13
Average citations per documents ²	11.26
References	20,923
Authors	1781
Authors of single-authored documents	8
Single-authored docs	8
Co-Authors per documents ³	4.4
International co-authorships % ⁴	27.02

¹ The average yearly growth in the number of publications. ² The total number of citations divided by the total number of publications. ³ The proportion of author appearances, where each occurrence of an author (e.g., an author appearing in two papers is counted twice) is compared to the total number of publications. ⁴ The percentage of publications with authors affiliated with institutions from multiple countries relative to the total number of publications.

Table 3. Author impact.

Author	H_Index	G_Index	M_Index	Total Citations	Number of Publications	Publication Year Start
Wu HJ	6	11	2	143	11	2022
Calautit JK	5	7	1	226	7	2020
Chen JP	5	9	1.667	151	9	2022
Fu QM	5	9	1.667	151	9	2022
Liu JY	5	7	1.25	153	7	2021
Lu Y	5	9	1.667	112	9	2022
Wei SY	5	6	1	134	6	2020
Amayri M	4	5	1.333	46	5	2022
Capozzoli A	4	4	0.8	184	4	2020
Fan C	4	4	1	90	4	2021
Homod RZ	4	4	1.333	50	4	2022
Piscitelli MS	4	4	0.8	184	4	2020
Tien PW	4	4	1	71	4	2021
Wang YZ	4	5	1.333	98	5	2022
Wu YP	4	4	1	153	4	2021

Table 4. Key influential sources in the field.

Source	H_Index (Local)	G_Index (Local)	M_Index (Local)	Total Citations	Number of Publications (2020–2024)
Energy and Buildings	25	41	5	2267	149
Journal of Building Engineering	15	29	3	1049	82
Applied Energy	10	17	2	381	17
Building and Environment	9	16	2.25	275	22
Sustainable Cities and Society	9	17	1.8	315	26
Buildings	8	12	2	196	42
Building Simulation	6	13	1.5	187	15
Energy	5	8	1.667	80	10
Energies	4	9	1	89	11
Renewable and Sustainable Energy Reviews	4	4	1	170	4
Building Services Engineering Research and Technology	3	4	0.75	24	5
Frontiers in Built Environment	3	4	1	36	4
Journal of Building Performance Simulation	3	3	1	28	3

Table 5. The most cited publications worldwide in the field.

No.	Paper	Total Citations	Total Citations per Year	Normalized Total Citations
1	Olu-Ajayi R, 2022, J Build Eng [48]	227	75.67	12.29
2	Brandi S, 2020, Energ Buildings [49]	128	25.6	1.91
3	Xie JQ, 2020, Energ Buildings [50]	122	24.4	1.82
4	Gopinath R, 2020, Sustain Cities Soc [51]	115	23	1.72
5	Dong ZX, 2021, Energ Buildings [52]	111	27.75	5.35
6	Hosamo HH, 2022, Energ Buildings—A [53]	101	33.67	5.47
7	Seyedzadeh S, 2020, Appl Energ [54]	99	19.8	1.48
8	Zhang WX, 2022, Renew Sust Energ Rev [20]	92	30.67	4.98
9	Mounir N, 2023, Energ Buildings [55]	77	38.5	8.4
10	Fu QM, 2022, J Build Eng [9]	71	23.67	3.84

Table 6. Most relevant affiliations.

Affiliation	Articles
Suzhou University Of Science And Technology	27
Chongqing University	21
Tongji University	19
Tsinghua University	19
Shenzhen University	16
Xi’an University Of Architecture And Technology	16
United States Department Of Energy (Doe)	15
Zhejiang University	15
King Fahd University Of Petroleum And Minerals	14
National University Of Singapore	13

Table 7. Most frequent words.

Words	Occurrences	Dim1	Dim2	Cluster
performance	91	0.17	0.41	1
model	88	0.23	0.33	1
optimization	56	−0.54	−0.44	1
simulation	56	−0.08	−0.27	1
consumption	55	0.41	0.31	1
prediction	50	0.44	−0.23	1
buildings	43	−0.36	0.16	1
design	42	0.18	0.77	1
systems	42	−0.14	−0.28	1
energy consumption	35	0.02	0.03	1

Table 8. Cluster analysis of keywords.

Cluster	Callon Centrality	Callon Density	Rank Centrality	Rank Density	Cluster Frequency
storage	0	25	1.5	13	4
implementation	0.083	16.667	6	2.5	6
CO₂ emissions	0.16	18.571	11	4	12
demand	1.492	23.73	13	11	305
network	0.088	14.286	7	1	7
performance	5.59	19.632	14	5	1165
demand response	0.125	22.917	8	10	10
compressive strength	0.128	16.667	9	2.5	6
energy management	0.04	20	3	7.5	5
challenges	0.05	22	4	9	10
electricity consumption	0.451	19.984	12	6	34
life-cycle assessment	0.075	25	5	13	10
hot	0	25	1.5	13	4
internet	0.15	20	10	7.5	5

Table 9. Keyword clusters and topic influence analysis.

Label	Group	Frequency	Centrality	Impact
optimization—conf 29.1% buildings—conf 31.7% model—conf 16%	1	83	0.26	2.238
performance—conf 34.9% model—conf 30.9% optimization—conf 38.2%	2	132	0.213	2.081
performance—conf 52.3% model—conf 43.2% consumption—conf 60.4%	3	187	0.227	2.513
behavior—conf 24% prediction—conf 8.5% algorithm—conf 20%	4	27	0.152	1.682
model—conf 6.2% buildings—conf 9.8% comfort—conf 15%	5	21	0.183	1.656

Table 10. Practical applications of machine learning in various scenarios.

Application Scenarios	Role of Machine Learning	Corresponding Techniques	Key Advantages	Typical Cases	References
Carbon Emission Calculation and Optimization	Data integration	Anomaly detection algorithms, data cleaning algorithms	Enhances data quality and reliability, laying a foundation for carbon emission modeling	Multisource heterogeneous data integration and analysis	[75,76,77]
	Carbon emission modeling and prediction	Deep learning (e.g., ANN), time series forecasting (LSTM, Transformer), reinforcement learning	Accurately captures the complex relationship between building performance and carbon emissions, improving prediction accuracy	Carbon emission trend prediction	[48,69,78,79]
	Optimization of life cycle carbon emission assessment	Reinforcement learning, genetic algorithms	Provides dynamic optimization strategies to balance energy consumption and carbon emissions	Intelligent energy management systems	[80,81,82]
	Data privacy and model sharing	Federated learning	Addresses data privacy issues and enhances regional optimization capabilities	Collaborative optimization within industries	[72,73,74]
Energy-Saving Design Methods and Practices	Data-driven analysis	Regression models, clustering algorithms	Extracts key parameters to optimize design elements	Region-specific climate design	[83,84,85,86]
	Intelligent design assistance	Genetic algorithms, Bayesian optimization, BIM with reinforcement learning	Rapidly explores multi-objective design solutions, balancing energy consumption and comfort	Optimization of building orientation and materials	[87,88,89,90]
	Material and structure optimization	Database mining, simulation techniques	Recommends efficient materials, optimizing natural ventilation and shading designs	Natural ventilation path optimization	[91,92,93,94]
	Environmental adaptability design	Digital twin technology, future climate analysis models	Simulates building operational states and evaluates energy-saving effects in real time	Digital twin building design	[53,95,96,97]
Intelligent Energy Management Strategies	Energy consumption forecasting	Time series analysis (ARIMA, LSTM)	Accurately predicts building energy consumption to support system scheduling	Adjustment during peak energy demand periods	[98,99,100,101]
	System operation strategy optimization	Reinforcement learning	Dynamically adjusts HVAC systems to balance energy consumption and comfort	Intelligent air-conditioning systems	[102,103,104]
	Anomaly warning and equipment diagnostics	Deep learning, anomaly detection algorithms	Enhances management reliability and extends equipment lifespan	Early warning for air-conditioning equipment	[105,106,107]
Performance Prediction and Environmental Quality Monitoring	Energy efficiency prediction and optimization	Regression analysis, deep learning, ensemble learning	Identifies potential issues in advance and optimizes energy scheduling	Energy consumption optimization management	[52,108]
Performance Prediction and Environmental Quality Monitoring	Real-time indoor environmental quality monitoring and control	Deep learning, sensor networks	Dynamically adjusts air-conditioning and lighting systems to ensure optimal environmental conditions	Indoor thermal comfort control	[109,110,111]
Operations and Fault Diagnosis	Fault prediction and maintenance	Anomaly detection, deep learning	Predicts equipment failures and reduces downtime	Optimized maintenance of lighting systems	[112,113]
Operations and Fault Diagnosis	Real-time monitoring and remote management	IoT combined with deep learning	Enhances operational efficiency and management flexibility	Intelligent building equipment management	[114,115]

Table 11. Application challenges and technical bottlenecks.

Category	Specific Challenges	Cause Analysis	Strategies
Data Level	High heterogeneity of data sources, complex integration	Data come in various forms (e.g., energy consumption monitoring, weather data, BIM), with differing formats, sampling frequencies, and accuracies, lacking standardization [80]	Standardize data collection and cleaning processes and develop unified data processing tools
	Severe issues with missing, inconsistent, and noisy data	Variations in collection device performance, environmental interference, or human errors lead to low data quality, affecting model training [116]	Employ anomaly detection and data cleaning techniques, such as missing value imputation and noise filtering, to enhance data reliability
	Data scarcity and imbalance	In specific scenarios (e.g., fault diagnosis), normal data dominate, while minority class samples are insufficient, impairing the model’s ability to recognize minority categories	Use data augmentation techniques (e.g., synthetic data generation), clustering analysis, and transfer learning to mitigate imbalance issues
	Data privacy and security constraints	Data involving user behavior or corporate information cannot be directly shared, increasing collaboration difficulties [117,118,119]	Introduce federated learning to enable localized training and collaborative optimization while protecting privacy
Model Level	“Black-box” nature of models reduces trust	Complex models like deep learning lack transparency, making their decision-making logic difficult to interpret [18]	Adopt explainable AI (XAI) techniques, integrating feature importance analysis and visualization tools to enhance transparency
	Insufficient generalization ability	Diverse building scenarios and limited training data lead to poor model performance in new environments [76]	Enhance data diversity (e.g., cross-regional data fusion) and optimize model architectures to improve adaptability
	Lack of real-time updating capability	Building system operations are dynamic, and traditional models cannot quickly adapt to new data	Develop online learning methods to support real-time model updates and continuous optimization
Application Complexity	Multi-objective optimization increases design complexity	Scenarios (e.g., energy-saving design, energy management) require balancing multiple objectives, such as energy consumption, comfort, and cost, adding to the complexity of model design and optimization [120,121,122]	Employ reinforcement learning and multi-objective optimization algorithms, combined with intelligent search strategies, to quickly explore optimal design and operational parameters
Application Complexity	Cross-regional standard differences	Significant differences in carbon emission calculations and building design standards across regions make direct model application challenging [123,124]	Implement modular model design and parameter tuning to adapt to regional standards
Implementation and Deployment	High technical threshold	Machine learning requires interdisciplinary knowledge in mathematics, computer science, and architecture, but relevant professionals often lack such backgrounds	Conduct interdisciplinary training and develop user-friendly tools and platforms
Implementation and Deployment	High development and deployment costs	Projects require highly customized development, with tools and platforms lacking standardization, leading to high resource consumption [125,126]	Develop standardized and modular machine learning frameworks and platforms to reduce redundant development costs
	Lack of robust data-sharing mechanisms	Data silos between building projects hinder collaboration and limit the use of cross-project data resources [127,128]	Establish trusted data-sharing mechanisms, leveraging blockchain technology to ensure secure data exchange

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, J.; Chen, J. Applications and Trends of Machine Learning in Building Energy Optimization: A Bibliometric Analysis. Buildings 2025, 15, 994. https://doi.org/10.3390/buildings15070994

AMA Style

Liu J, Chen J. Applications and Trends of Machine Learning in Building Energy Optimization: A Bibliometric Analysis. Buildings. 2025; 15(7):994. https://doi.org/10.3390/buildings15070994

Chicago/Turabian Style

Liu, Jingyi, and Jianfei Chen. 2025. "Applications and Trends of Machine Learning in Building Energy Optimization: A Bibliometric Analysis" Buildings 15, no. 7: 994. https://doi.org/10.3390/buildings15070994

APA Style

Liu, J., & Chen, J. (2025). Applications and Trends of Machine Learning in Building Energy Optimization: A Bibliometric Analysis. Buildings, 15(7), 994. https://doi.org/10.3390/buildings15070994

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Applications and Trends of Machine Learning in Building Energy Optimization: A Bibliometric Analysis

Abstract

1. Introduction

1.1. Background of Machine Learning Applications in Building Energy Efficiency

1.2. Research Status and Problems

1.3. Research Objectives and Significance

1.4. Structure of This Paper

2. Literature Collection and Analysis Methods

2.1. Research Questions and Scope

2.2. Data Collection and Selection Criteria

2.3. Bibliometric Analysis Tools and Methods

3. Results

3.1. Demographic Overview of the Study Area

3.1.1. Overview of the Sampled Publications

3.1.2. Authors with the Highest Productivity

3.1.3. The Most Influential Sources

3.1.4. Leading Publications in the Field

3.1.5. Three-Field Plot Overview

3.2. Geographical Perspective of the Study Area

3.2.1. Scientific Output and Collaboration Across Countries

3.2.2. Countries’ Key Research Affiliations

3.3. Intellectual Perspective of the Study Area

3.3.1. Author Co-Citation Analysis

3.3.2. Journal Co-Citation Analysis

3.3.3. Document Co-Citation Analysis

3.3.4. Co-Occurring Keyword Network

3.4. Thematic Evolution Perspective of the Study Area

3.4.1. Thematic Map

3.4.2. Clustering by Coupling

3.5. Application Perspective of the Study Area

3.5.1. Practical Applications of Machine Learning in Various Scenarios

3.5.2. Application Challenges and Technical Bottlenecks

3.5.3. Opportunities and Transformations with Emerging Technologies

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI