Big data technologies include several disciplines including statistics, data mining, machine learning, neural networks, optimization methods, and visualization approaches [
48]. There are many specific technologies in this field and some overlap [
14]. Collected papers were classified according to the research methodology used to write the papers.
Table 6 shows that they were classified into seven research methodologies including four analysis techniques such as data mining, statistics, simulation, and deep learning, review articles, and combinations in BER-B.
3.3.1. Research Methodology Based on Data Mining
Many studies were conducted to save energy by improving the building system using collected data in the building energy field. Catrini et al. (2020) took data from a large Do It Yourself store in northern Italy as a case study to investigate the profitability of a third-generation system for commercial buildings. Compared with the existing energy conversion system, energy consumption and CO
2 emissions have been reduced [
49]. Chiosa et al. (2021) proposed an innovative anomaly detection and diagnosis methodology that automatically detects abnormal energy consumption at the level of the entire building and diagnoses the lower level for abnormal patterns. The paper performed a post-mining analysis based on association rule mining and found the major lower level that mainly affects the anomalies detected in the entire building [
50]. Conner et al. (2005) studied data collected from several applications deployed throughout the building that monitor room occupancy and environmental statistics and provide access to room reservation status. Using the proposed protocol, it was confirmed that a small number of wall-powered nodes can significantly increase the lifespan of the battery-powered network [
51].
Kang and Choi (2018) proposed a building information modeling(BIM)-based data mining method to support data integration and function expansion in consideration of functional variability and scalability. Scenarios related to building energy management were developed, and work efficiency was verified through analysis of the generated results [
52]. Li et al. (2020b) developed a new operational approach to improve the energy efficiency of HVAC systems in office spaces. This approach can contribute to a large amount of energy savings when operating an HVAC system [
53]. Liu et al. (2018) proposed an adjustable duty cycle-based fast dissemination method for minimum-transmission broadcast in the smart wireless software-defined network. By using the remaining energy of the node, the duty cycle of the node is increased to reduce the transmission time and delay while maintaining the network life [
45]. Ma et al. (2021) reduced the transmission time while maintaining the network life by increasing the duty cycle of the node using the remaining energy of the node. Compared to conventional methods, this method was able to provide a more comprehensive analysis of energy-saving potential in a shorter time [
54].
Reshma et al. (2019) suggested that the wireless sensor network (WSN) is an energy-aware routing protocol to solve the limited node energy problem. The proposed protocol found energy efficiency in routing vast amounts of data and extending network connectivity [
55]. Shahrokni et al. (2014) quantified resource efficiency and greenhouse gas reduction by studying empirical meter data obtained from district heating use in Stockholm. The study provided a benchmark of excellence in the energy performance monitoring stage [
56]. Silva et al. (2020) proposed a big data analysis-embedded smart city architecture, which uses a smart gateway system to integrate the web and smart control system. The system was evaluated based on a certified data set from a variety of reliable data sources [
23]. Sohn and Dunn (2019) presented an exploratory analysis of an existing large-scale building energy data set, focusing on two data sets: the commercial building energy consumption survey in the United States and the building performance database [
57]. Suciu et al. (2016) studied how to process big data collected from distributed cloud systems based on general systems and remote telemetry units for remote monitoring of renewable energy targets [
58]. In these studies, the efficiency of the improved system was verified. Van Dronkelaar et al. (2016) reviewed the discrepancy between predicted and measured energy use based on 62 buildings and presented an outlook to global studies [
47].
Meanwhile, Guo et al. (2020) compared data on energy use intensity, energy structure, and carbon emissions in different countries [
59]. The paper discussed including support for relevant policies, potential work, and limitations on building energy data to address new trends in related research. Tang et al. (2021) showed that a case study of field measurements at a Hong Kong campus building was conducted to investigate the impact of technical guided occupant behavior on the energy use of a central air conditioning system [
60]. The paper developed a new concept, skill-driven occupant behavior, to coordinate between occupants and technology/control strategies for the building’s energy system. Unlike most studies, these studies showed a new direction in the field of BER-B. Therefore, it is possible to derive the direction of further research or new research in the BER-B field by referring to these studies.
3.3.2. Research Methodology Based on Statistics
Statistics, the most basic technique for data analysis, exists in almost all fields of study [
14]. Kamel et al. (2020) studied feature selection for hot water, heating, cooling, and ventilation loads in residential buildings under the mixed-humid climate zone. Filter method, wrapper recursive feature elimination, wrapper backward elimination, linear regression, Lasso regression, and Extreme Gradient Boosting regression are adopted for heating and cooling days [
64]. Filippi and Sirombo (2019) showed the results of two-year residential monitoring of a case project of 152 apartments for large social housing built near Milan, Italy. Data on thermal energy consumption for heating and cooling, domestic hot and cold water use, and control units were evaluated [
63]. Liu et al. (2015) studied the correlation between energy consumption and architectural surfaces from the perspective of passive design. Taking Beijing as a case project, a typical high-rise office building floor plan was reviewed in terms of annual dynamic load and energy consumption [
65].
Manfren et al. (2020) analyzed the topic of linking the energy performance of the design and operation phases through a regression-based approach of buildings, emphasizing the hierarchical nature of building energy modeling data [
66]. Pan and Zhang (2020) is a case study, validated on a multidimensional data set of building energy performance provided by the city government of Seattle. The study was conducted to estimate the building’s weather-normalized site energy use intensity and characterize possible nonlinear relationships with 12 other possible sources [
67]. Piscitelli et al. (2021) developed and tested a tool to detect and diagnose abnormal daily electrical energy patterns associated with substations on university campuses. The study developed an error-free predictive model using an artificial neural network (ANN) with the following regression tree to find an unusual trend in electrical energy consumption [
68]. Qian et al. (2020) proposed a virtual sensor modeling method to determine the actual energy efficiency and power consumption of the variable refrigerant flow system in residential buildings. Statistical and clustering analysis was performed to determine energy efficiency and power consumption to discover the distribution and typical operating load patterns of the variable refrigerant flow system in Chinese residential buildings [
69].
There are studies conducted by collecting GIS-based big data as follows. Ahn and Sohn (2019) provided an investigation into the effectiveness of three urban form variables (vertical density, horizontal compactness, and variation of building heights), using the Seattle city government’s building energy consumption data and GIS data obtained from the Washington Geospatial Data Archive. Using a regression model, the study confirmed that the correlation between the spatial delay model of building energy use and its surrounding level is spatially dependent on the energy consumption of the building [
61]. Buffat et al. (2017) collected Swiss residential building data and modeled the building heat demand of large regions with a high temporal resolution using a large-scale GIS-based method. It was validated against a database containing the measured energy demand of 1845 buildings in the city of St. Gallen and 120 buildings in the town of Alpine in Zernez, Switzerland [
62]. Using the research results of these papers, strategies for better energy saving in buildings and urban buildings can be developed. It is also expected to provide useful information to urban designers and planners.
3.3.3. Research Methodology Based on Simulation
Simulation can be divided into general simulation and optimization, and it occupied the highest percentage of research methodologies. First, the following studies were based on general simulation. Alsaleemet et al. (2020) presented a method to access and exploit wearable device data to build a personal thermal comfort model. The paper evaluated various supervised machine-learning algorithms to produce accurate personal thermal comfort models for each building occupant that exhibit superior performance compared to a general model for all occupants [
71]. Dawood et al. (2017) studied the potential to improve functionality, the accuracy of energy use calculations, and visualization of carbon emissions using remotely sensed data in commercial and open-source combinations. The study used light detection and ranging data for a mixed-use urban area in the northeastern part of the UK and presented a method to improve the quality of the input data for carbon modeling [
72]. Fan et al. (2019a) described and evaluated a data-driven building energy performance model to increase the practical value of advanced machine learning technologies in the building sector [
41]. Han et al. (2019) proposed a virtual machine rearrangement method to increase energy efficiency by increasing the density of virtual machines using the Knapsack algorithm. Through the proposed method, the Knapsack algorithm was improved to achieve efficient virtual machine relocation in a short period of time [
73].
Hong et al. (2020) described a new methodology for generating synthetic smart meter data for a building’s electricity use using detailed building energy modeling, which aims to capture the variability and probability of real energy use in buildings. Their methodology was able to generate customized data sets to represent specific scenarios with a known truth and a controllable amount of synthetic noise [
74]. Jin et al. (2021) proposed a novel data-driven predictive control method based on a temporal sequential-based artificial neural network. As a result of the simulation, it was confirmed that the proposed method achieved higher accuracy (97.4%), and fewer false errors (from 79.5 of the existing time delay method to 0.6 times per day) were achieved by the model [
75].
Ma and Cheng (2016) proposed a GIS-integrated data mining methodology framework for estimating the energy use intensity of urban-scale buildings, including feature selection, preprocessing, and algorithm optimization. A case study estimating the energy use intensity of 3640 multi-family residential buildings in New York City has been tested and validated [
46]. Mosteiro-Romero et al. (2020) presented a computational approach for the quantitative analysis of building energy demand at the district scale, including interdependent factors such as relative humidity and wind speed, local air temperature, diversity in building geometry, and materials. This method, which couples the microclimate model program ENVI-met and the district-scale energy simulation tool City Energy Analyst, was applied to a case study in Zurich, Switzerland [
76]. Nematchoua et al. (2020) analyzed and compared the energy consumption and carbon emissions of six building categories commonly found in Saharan Africa cities that are designed to be deployed in 13 cities unevenly distributed in six climate regions of Madagascar [
77].
Among simulation studies, there are studies that have not been conducted solely by data collection. Bambara et al. (2021) compared the energy performance (consumption and generation) of an existing house with that of two energy-efficient homes equipped with building-integrated photovoltaics (BIPV) that are built on the same land lot in Montreal, Canada. An energy model was created for each of the houses, and annual energy simulations were performed to quantify their energy performance [
71]. Chen et al. (2021) conducted a study using data generated by calibration and program. The identification method was tested by adopting data generated by EnergyPlus calibrated in real buildings. In order to understand the control strategy of HVAC in a given building using a data mining algorithm, data on the building automation system (BAS) and data on building energy were analyzed [
78]. Li et al. (2021) developed a data-based building energy prediction model using a transfer learning method that uses data from other buildings as limited data. To test the proposed method, data from 400 non-residential buildings from the open-source Building Genome Project were used, and the energy prediction accuracy was verified [
79].
Su et al. (2009) studied a set of 3600 alternative weather files generated using different parameter weights for Beijing, China. The study evaluated the influence of different parameter weights when generating weather data in the form of a typical year according to a typical weather year methodology [
80]. Van Dronkelaar et al. (2019) quantified the impact of underlying causes of this gap by developing building simulation models of four buildings and then calibrating them toward their measured energy use at a high level of data granularity [
81]. Ye et al. (2016) confirmed that long-term data can be predicted with short-term data because they do not change over time through simulation. The paper investigated and verified the appropriateness of using empirical models to predict long-term emissions with short-term data, typically less than one month [
82].
In addition, the optimization studies through modeling are as follows. Ali et al. (2020b) developed and simulated a methodology for optimizing the urban-scale energy retrofit determination of residential buildings in Ireland using a data-driven approach. The importance of data-based renovation modeling was confirmed by reducing the number of functions in the building inventory database from 203 to 56 with a building grade prediction accuracy of 86% [
83]. Belafi et al. (2017) introduced a method of implementing occupant behavior modeling into Hungarian office building projects and a way to support the building audit and renovation analysis process. As a result of optimization simulations, it was found that the building could save 23% of its annual energy use [
84]. Moreno et al. (2016) presented a new approach to energy conservation in buildings through the application of soft computing techniques and identification of relevant parameters to generate predictive models of energy consumption in buildings. These models can be used to define strategies for optimizing the building’s daily energy consumption [
85].
In general, big data-based research is conducted by collecting actual data. However, in the investigated simulation studies, not only the studies were conducted using data collection, but also many studies were conducted using data generated by the program. These studies predicted or optimized the energy using the developed model, and evaluated the model. Collecting data comes with many difficulties without the active cooperation of the building owners. Therefore, research methodology using a program that can generate data is expected to continue to be highly useful in the BER-B field.
3.3.4. Research Methodology Based on Deep Learning
Deep learning is an algorithm based on learning representations of data in machine learning [
14]. Fan et al. (2019b) studied the usefulness of advanced recurrent neural network-based strategies for energy prediction. At a high level, three inference approaches were used to generate short-term predictions, including the recursive approach, the direct approach, and the multi-input and multi-output approach [
43]. Niu et al. (2018) estimated consumption using five commonly used data-based algorithms: autoregressive with external inputs, subspace state space, state space, discretized variable Bayesian network, and continuous variable Bayesian network [
86]. Zhao et al. (2021) proposed the hybrid-model-based deep reinforcement learning method for the HVAC control problem. Using this method, the protection mechanism and adjusting reward methods are used to further reduce the learning cost [
87]. These papers conducted using deep learning were conducted for the purpose of energy prediction, and the derived results can be applied to real building models. In addition, it is expected to provide useful data for building experts by providing more insight into the model for building energy prediction.
3.3.5. Research Methodology Based on Survey and Investigation
Arora and Bala (2020) conducted a survey to address the major challenges of big data applications hosted in cloud data centers by reviewing existing energy-efficient technologies. As a result of reviewing various energy efficiency technologies for big data applications, it was confirmed that hybrid energy efficiency technologies are the best way to reduce overall energy consumption [
88]. Bibri and Krogstie (2020) investigated and compared Stockholm and Barcelona in terms of smart sustainable cities based on environmental data by adopting a case study as a qualitative research methodology. As a result, it has been confirmed that smart grid, smart building, smart meter, smart environmental monitoring, and smart city ambassador are major data-based smart solutions applied to improve and develop environmental sustainability in eco-city and smart city [
89].
Li et al. (2015) introduced the concepts, features, and applications of big data and analyzed various data related to the three main stages (beginning of life, mid-life, and end of life) of product life cycle management. In product life cycle management, the paper summarized existing big data applications, and investigated and analyzed potential applications of big data technology [
44]. Salim et al. (2020) presented various sources of resident-centered city data useful for data-driven modeling and classified a range of applications and recent data-driven modeling techniques for urban behavior and energy modeling along with traditional stochastic and simulation-based approaches [
90]. In these surveys and investigations in the field of BER-B, extensive data were collected, various problems related to energy efficiency technologies, smart city and building, and resident-oriented cities were raised, and the direction of future research was explained. Therefore, it is expected that these studies will be able to narrow the knowledge gap by providing insights on related topics and confirming detailed information about the research environment and its impact on energy use.
3.3.6. Research Methodology Based on Review
Bibri (2018) reviewed relevant literature for the purpose of identifying and discussing state-of-the-art sensor-based big data applications supported by IoT for environmental sustainability and related data processing platforms and computing [
42]. Fan et al. (2018) provided a review of unsupervised data analysis studies in mining building operation data and improving building operation performance [
24]. Fan et al. (2021) reviewed articles on data-driven methods for real-world applications for building energy modeling and improving building performance, especially those based on large data sets [
15].
Li et al. (2020c) provided a review on the studies on energy consumption calculation of urban buildings and divided the reviewed studies into three categories: database, model, and platform [
91]. Mehmood et al. (2019) provided a review of studies on applying AI and big data to energy-efficient buildings, focusing on the use of machine learning and large databases for improved search, optimization speed, and accuracy [
25]. Yan et al. (2020) reviewed the AI-based applications in building energy efficiency with a focus on facilitating the implementation of zero energy building, considering the influence of occupant behaviors [
92].
3.3.7. Research Methodology Based on Combination
Ali et al. (2020a) predicted the energy performance of over 2 million buildings using data from approximately 650,000 Irish Energy Performance Certificate buildings. The study provided 88% prediction accuracy using a deep learning algorithm and proposed an integrated methodology using building energy modeling [
93]. Jeong et al. (2021) applied a data-driven approach to setting a CO
2 emission benchmark using data mining techniques. Data for a total of 1212 multi-family housing complexes were established, and the developed CO
2 emission benchmark was verified using statistical methods such as Mann–Whitney test and Kruskal–Wallis test [
94]. Risch et al. (2021) modeled and calibrated three office buildings in Germany based on one-year hourly measurement data. The corrected model was evaluated as a coefficient of variation of root mean squared error and R
2, this study was simulated using the corrected values, and statistical analysis was conducted [
95]. Yigit (2021) developed a surrogate model-based integrated optimization system to obtain energy-optimal thermal designs for residential buildings in the most urbanized cities in Turkey under different levels of budget constraints. This system is an integrated system consisting of a genetic algorithm optimization technique and gradient-boosting machine-based surrogate model [
96].