Machine Learning Techniques in the Energy Consumption of Buildings: A Systematic Literature Review Using Text Mining and Bibliometric Analysis

: The high level of energy consumption of buildings is signiﬁcantly inﬂuencing occupant behavior changes towards improved energy efﬁciency. This paper introduces a systematic literature review with two objectives: to understand the more relevant factors affecting energy consumption of buildings and to ﬁnd the best intelligent computing (IC) methods capable of classifying and predicting energy consumption of different types of buildings. Adopting the PRISMA method, the paper analyzed 822 manuscripts from 2013 to 2020 and focused on 106, based on title and abstract screening and on manuscripts with experiments. A text mining process and a bibliometric map tool (VOS viewer) were adopted to ﬁnd the most used terms and their relationships, in the energy and IC domains. Our approach shows that the terms “consumption,” “residential,” and “electricity” are the more relevant terms in the energy domain, in terms of the ratio of important terms (TITs), whereas “cluster” is the more commonly used term in the IC domain. The paper also shows that there are strong relations between “Residential Energy Consumption” and “Electricity Consumption,” “Heating” and “Climate. Finally, we checked and analyzed 41 manuscripts in detail, summarized their major contributions, and identiﬁed several research gaps that provide hints for further research.


Introduction
Over the past 20 years, the world's growing energy demand has led to a growing interest in energy efficiency in the residential, services, and public building sectors. The energy consumption of buildings (ECB) is a big challenge in most European countries since buildings consume a large amount of energy, especially sustainability buildings in energy sectors (e.g., public buildings) [1]. In Europe, in 2019, the transport sector accounted for 37% of total final energy consumption in the EU Member States, followed by the households (32%), industry (42%), and services (23%) sectors. Additionally, over the past years, the efficiency of appliances and equipment has increased substantially [2]. Therefore, European countries, notably Portugal, strive to enhance the energy efficiency in buildings while maintaining sufficient levels of thermal comfort and energy consumption, aiming at sustaining their economic and social levels [3]. The energy sector seeks to control energy consumption in general by analyzing available data sources and by studying different dimensions taken from analyzing data sources, such as natural gas and electricity usage data, residential building characteristics and the energy performance of building data, cooling, and heating systems data, or climate and weather forecast data, just to name a few [4]. Public authorities also seek to guide citizen behavior to more efficient uses of

Methods
The methodology of this literature survey is divided into two main parts: in the first part, we present a standard method to find and select published manuscripts. In the second part, we describe our survey results through the text mining and bibliometric analysis, as shown in Figure 1.
Our systematic literature survey presents an evaluation of the scientific community contributions to the topic of energy consumption of buildings by using a rigorous and auditable methodology based on the PRISMA approach.
The PRISMA method is composed of five phases, as follows: 1.
Identification of relevant manuscripts of the domain or domains.

2.
Screening of titles, abstracts, papers, excluding papers without experimental evidence and position papers. 3.
Final papers to be analyzed in detail.
1. Identification of relevant manuscripts of the domain or domains. 2. Screening of titles, abstracts, papers, excluding papers without experimental evidence and position papers. 3. Eligibility analysis. 4. Full text screening. 5. Final papers to be analyzed in detail.

Figure 1. Methodology steps.
We also adopted a text mining method and bibliometric map analysis. The bibliometric map is used to find the relationships between common in energy and machine learning domains terms [31], and text mining is used to find the more relevant terms about the energy and machine learning domains. To this end, we followed three phases, evaluating the following quantities: We also adopted a text mining method and bibliometric map analysis. The bibliometric map is used to find the relationships between common in energy and machine learning domains terms [31], and text mining is used to find the more relevant terms about the energy and machine learning domains. To this end, we followed three phases, evaluating the following quantities:
Frequency of these common words in the final manuscripts of the study.
By following PRISMA, this section is structured in the following way: (1) paper search strategy, (2) text mining approach and bibliometric map analysis, (3) inclusion and exclusion criteria, and (4) final paper selection.

Search Strategy
A literature survey generally recommends searching several available journal and conference paper repositories in order to determine if similar work has already been performed, aiding in locating potentially relevant studies. In this study, we searched the following electronic paper repositories: (1) IEEE Xplore, (2) Science Direct, (3) Springer, (4) Scopus, (5) Web of Science and reviewed the following types of manuscripts: technical reports, scientific conference papers, and scientific journal papers. The search query was created to match the search string only in the head of the manuscripts. We used alternative keywords, logically connected by 'OR' or 'AND' statements. The resulting search string utilized in the mentioned electronic repositories is depicted in Figure 2. formed, aiding in locating potentially relevant studies. In this study, we searched the following electronic paper repositories: (1) IEEE Xplore, (2) Science Direct, (3) Springer, (4) Scopus, (5) Web of Science and reviewed the following types of manuscripts: technical reports, scientific conference papers, and scientific journal papers. The search query was created to match the search string only in the head of the manuscripts. We used alternative keywords, logically connected by 'OR' or 'AND' statements. The resulting search string utilized in the mentioned electronic repositories is depicted in Figure 2:  In phase 1, we applied the search string to all electronic repositories looking for papers published between 2013 and 2019, which resulted in 822 publications. Phase 2 followed a 5-step approach. In step 1, we excluded manuscripts based on titles (e.g., energy consumption on industry buildings, transport, and services), which narrowed the set to 411 publications. In step 2, we excluded manuscripts based on abstracts screening, which resulted in 317 publications. In the following step 3, we excluded manuscripts reporting research without experiments, resulting in 106 publications. Subsequently, in step 4 of phase 2, we excluded position manuscripts which gave us the final figure of 41 publications. In phase 3, manuscripts underwent a full-text reading and review, which led to no exclusions (result of phase 4).
As a result of our paper selection approach, the final list included 41 manuscripts (phase 5), which are analyzed in detail in this paper. These were further divided into the following four categories, as shown in Tables 1-4: 1. Energy consumption of buildings (S1-S10). 2. Classification of ECB (S11-S20). 3. Prediction of ECB (S21-S33). 4. Combination of classification and prediction of ECB (S34-S41).

(energy OR consumption OR buildings) AND ("data mining"
OR "decision support system" OR "business analytics" OR forecasting OR "modern optimization" OR "machine learning" OR "backpropagation neural network" OR "feedforward neural network" OR "convolution neural network" OR "recurrent neural network" OR "K-mean clustering" OR "hierarchal clustering" OR "artificial intelligence" OR prediction OR predictive)    In phase 1, we applied the search string to all electronic repositories looking for papers published between 2013 and 2019, which resulted in 822 publications. Phase 2 followed a 5-step approach. In step 1, we excluded manuscripts based on titles (e.g., energy consumption on industry buildings, transport, and services), which narrowed the set to 411 publications. In step 2, we excluded manuscripts based on abstracts screening, which resulted in 317 publications. In the following step 3, we excluded manuscripts reporting research without experiments, resulting in 106 publications. Subsequently, in Energies 2021, 14, 7810 6 of 31 step 4 of phase 2, we excluded position manuscripts which gave us the final figure of 41 publications. In phase 3, manuscripts underwent a full-text reading and review, which led to no exclusions (result of phase 4).
As a result of our paper selection approach, the final list included 41 manuscripts (phase 5), which are analyzed in detail in this paper. These were further divided into the following four categories, as shown in Tables 1-4: 1.

4.
Combination of classification and prediction of ECB (S34-S41).  As mentioned, by adopting the PRISMA method, our paper analyzed 822 manuscripts published between 2013 and 2020. These manuscripts were filtered out to 106 studies based on title, abstract, and manuscripts without experiments. Since 106 manuscripts is a large number for a manual analysis, we describe in this section a text mining approach, aiming at discovering relevant terms in both intelligent computing models and energy consumption fields, that we performed in a stage prior to the full-text review of the retained 41 papers of this survey. Text mining was therefore adopted to allow the creation of structured information to improve the subsequent analysis of such manuscripts. To be effective, this type of technique requires the prior definition of a dictionary that includes not only common terms of the domain but also terms associated with concepts related to our research topic: intelligent computing models and the energy consumption of buildings. This approach is more comprehensive if we compare it with standard text mining techniques that randomly search, group, and count words. Thus, the authors created two dictionaries, one for "intelligent computing models" and another for "energy consumption of buildings," each of them including a preliminary list of expressions consisting of one or more words.  It should be noted that all three authors are experienced in the topics of the paper, particularly computer science and machine learning (first, second, third authors) and energy (third author). The manuscripts were analyzed in terms of title, abstract, and keywords to verify the dictionaries. Given the large number of manuscripts available for our analysis, a reasonable number of randomly selected articles were chosen to validate the dictionary. Such dictionaries are shown in Tables 5 and 6. It should be borne in mind that the terms "energy" and "intelligence" are not mentioned in the dictionaries since they are too broad. Terms like "industrial building" are also not included in our dictionaries whenever they represent a topic outside the scope of our research, like this example. During our analysis, we understood that it could be possible that some dictionary terms might not be available in title + abstract + keywords of our surveyed manuscripts since many terms could be expressed several times throughout an article, and these might be more relevant than the ones mentioned only in the abstract. Thus, the entire text was considered for an analysis of the collected literature, especially in the areas of intelligent computing models and energy consumption of buildings. The reference section was removed of all manuscripts during the analysis.
The second part of our terminology analysis adopted a bibliometric map to find critical relationships between factors and intelligent computing methods. Such a bibliometric map helps stakeholders to find the most used factors and methods and their relationships.
For these, we computed a quantity, referred to as "The Important Terms"-TITs (of a given word) Elgendy and Wu [70,71] that represents the importance of each word in the corpus, of the form: where: review = words in our studied corpus; £ = a common word, such as "consumption" or "cluster".
By looking at the results obtained from text mining, we can notice a close relationship between the most important terms found and all the manuscripts in our study. To ascertain this, we computed the following quantity PR (Paper Relationship) [72,73]: where: PCCT = number of papers that contain common terms; MRW = all manuscripts in related work.

Inclusion and Exclusion Criteria
All the manuscripts analyzed in our paper were selected based on the following inclusion criteria and specific criteria excluded manuscripts from our analysis, as shown in Table 7. Table 7. Inclusion and exclusion criteria.

Inclusion Criteria Exclusion Criteria
• Responding directly to one or more of our research questions.

Study Selection and Data Extraction
Using the above inclusion and exclusion criteria, our paper repository search returned several papers that were analyzed and read in-depth. We especially focused our attention in finding a scientific research gap. To aid the process, we created a data extraction form, which enabled us to collect relevant information from the selected primary papers in order to address our proposed research questions.

Results and Analysis
Our results and analysis were structured into three sections. We first show the approach and results of our text mining procedures, which included a word frequency table, a word offset plot, and a word cloud plot. In the word cloud plot, the font size represents the most frequent terms found (see Figure 4). The word offsets plot measures the word dispersion in a corpus (see Figure 5). By evaluating such quantities, we were able to show the relative importance of each word in our corpus through visualization. In the second subsection, we analyze the bibliometric map to find the critical relationships between factors and intelligent computing techniques most used in the ECB. Lastly, in the third sub-section, we analyze the retained 41 manuscripts.

Text Mining in Detail
Our text mining procedures included the following pre-processing steps over the main documents: removing all symbols, numbers, punctuations, and whitespaces, transforming all words to lowercase, reducing the dictionary terms to a single list of the terms that can be relied upon when reading and analyzing the most important articles for our research. This technique is depicted in Algorithm A1 (See Appendix A).
After applying our text mining procedure, we found 1077 common terms within two dictionaries, where a sample of these are presented in Tables 5 and 6. From these, we found the top 30 common terms ranked by higher word frequency, depicted in Algorithm A1. Terms 1 and 2, respectively, "consumption" and "buildings," are related to the "energy" domain, whereas the third term, 3, "predict," is associated with the machine learning ("in-telligent") domain. By looking at Table 8, we notice that several terms are related to energy consumption and intelligent methods used by several authors to solve problems in the energy domain. This analysis provides an overview of the factors used to determine energy efficiency or consumption in various types of buildings. It also highlights some intelligent computing methods used to solve such problems. Particularly, we can understand that research efforts are directed to use recent intelligent methods, such as deep learning, to address energy problems effectively.   In Figure 6, we depict the steps taken to obtain highly relevant terms related to topic of intelligent computing techniques applied to energy. We calculated the inter tion between the words of all manuscripts and two dictionaries of the machine learn and energy domains to obtain the frequency of the top words relevant to the study's search. We started by arbitrating the following keywords: "intelligent," "method," " ergy," and "buildings," which for us, are the most relevant in the scope of this stu Then, from text mining, we found additional important terms, such as "predictio "model," "consumption," and "residential." For example, the PR ratio is high between the words "consumption of buildings" all manuscripts (41) by 70.7%, and the remaining ratio (29.3%), regarding the words "   In Figure 6, we depict the steps taken to obtain highly relevant terms related to the topic of intelligent computing techniques applied to energy. We calculated the intersection between the words of all manuscripts and two dictionaries of the machine learning and energy domains to obtain the frequency of the top words relevant to the study's research. We started by arbitrating the following keywords: "intelligent," "method," "en ergy," and "buildings," which for us, are the most relevant in the scope of this study Then, from text mining, we found additional important terms, such as "prediction," "model," "consumption," and "residential." For example, the PR ratio is high between the words "consumption of buildings" and all manuscripts (41) by 70.7%, and the remaining ratio (29.3%), regarding the words "effi   Figure 4 depicts the more frequently used words by using a word cloud plot. Words in the word cloud represent the frequency or the significant of each word. Figure 5 shows the top five words, ranked by frequency, by using a word offset plot. This plot depicts the position of a term in the purview from its starting position. A dispersion plot is used to show the positional information. Each stripe shows an instance of a term, and each row shows the whole text. We can notice, in both Figures 4 and 5, that we computed these plots after using our stemming method, and therefore, we end up with many cropped words.
In Figure 6, we depict the steps taken to obtain highly relevant terms related to the topic of intelligent computing techniques applied to energy. We calculated the intersection between the words of all manuscripts and two dictionaries of the machine learning and energy domains to obtain the frequency of the top words relevant to the study's research. We started by arbitrating the following keywords: "intelligent," "method," "energy," and "buildings," which for us, are the most relevant in the scope of this study. Then, from text mining, we found additional important terms, such as "prediction," "model," "consumption," and "residential." For example, the PR ratio is high between the words "consumption of buildings" and all manuscripts (41) by 70.7%, and the remaining ratio (29.3%), regarding the words "efficiency of buildings." Additionally, the PR ratio is maximum between the regular expression "cluster * buildings" and all manuscripts of Tables 2 and 4 (category of prediction of the ECB), respectively, by 100%.
In addition, the PR ratio between the terms the regular expression "neural * buildings," and all manuscripts of Table 3 (category of prediction of energy consumption of buildings), as well as of Table 4 was evaluated at 52.4%, and the remaining ratio (47.6%), to other machine learning techniques with the term "buildings." We can conclude that the text mining results allowed us to find the most used terms in the intelligent computing techniques topic applied to energy. Moreover, it helped us find the most relevant manuscripts of the said topic.

Bibliometric Map (VOS Viewer)
We used VOS viewer ("VOS viewer," n.d.), a visualizing bibliometric network, to find common terminology in two areas: energy consumption and machine learning techniques across the 41 manuscripts under analysis. This tool supported our study, with visual information enabling us to explore the relations between the domains of energy and intelligent techniques. It also helped us find the most common dimensions, clustering, and variety techniques to answer our research questions. Figure 7 represents the network map visualization that displays the relations between the most popular terminology and how it is linked. The larger node represents the popular terminology in manuscripts, and the size of it represents the number of times it appeared in manuscripts. VOS viewer splits the terminology into clusters according to the relevance in relation to each other. ciency of buildings." Additionally, the PR ratio is maximum between the regular expression "cluster * buildings" and all manuscripts of Tables 2 and 4 (category of prediction of the ECB), respectively, by 100%.
In addition, the PR ratio between the terms the regular expression "neural * buildings," and all manuscripts of Table 3 (category of prediction of energy consumption of buildings), as well as of Table 4 was evaluated at 52.4%, and the remaining ratio (47.6%), to other machine learning techniques with the term "buildings." We can conclude that the text mining results allowed us to find the most used terms in the intelligent computing techniques topic applied to energy. Moreover, it helped us find the most relevant manuscripts of the said topic.

Bibliometric Map (VOS Viewer)
We used VOS viewer ("VOS viewer," n.d.), a visualizing bibliometric network, to find common terminology in two areas: energy consumption and machine learning techniques across the 41 manuscripts under analysis. This tool supported our study, with visual information enabling us to explore the relations between the domains of energy and intelligent techniques. It also helped us find the most common dimensions, clustering, and variety techniques to answer our research questions. Figure 7 represents the network map visualization that displays the relations between the most popular terminology and how it is linked. The larger node represents the We performed our analysis on the title and abstract using a binary counting method of 1177 examined keywords with a minimum threshold of 3 occurrences, resulting in 33 terminologies, as shown in Figure 6. In addition, the accuracy of bibliometric analysis is 0.9069. The largest nodes representing the important nodes of each cluster in the network map are determined as "Neural Network" and "Energy Consumption Prediction" (red), "Deep Neural Network" and "Energy Use" (yellow), "Cluster" and "Electricity Consumption" (green), and finally, "Heating factor," "Climate," and "Residential Energy Consumption" (blue).
Looking closer at the network map in Figure 6, we can see that the 4 clusters are connected. For instance, the "Neural Network" term is connected to "Energy consumption prediction" in the same red cluster, it is connected to "Prediction Model" and "Energy Use" in the yellow cluster, it is connected to "Electricity Consumption" and "Smart Meter" in the green cluster, and finally, it also connected to "Residential Energy Consumption" in the blue cluster. In addition, the term "Cluster" in the green cluster is connected to Energies 2021, 14, 7810 15 of 31 "Prediction Model" in the yellow cluster, "Energy consumption prediction" in the red cluster, and "Residential Energy Consumption" in the blue cluster. Moreover, the terms "Heating" and "Climate" are connected to "Residential Energy Consumption" in the blue cluster, "Cluster" and "Electricity Consumption" in the green cluster, "Prediction Model" in the yellow cluster, and "Neural Network" and "Energy consumption prediction" in the red cluster.  We performed our analysis on the title and abstract using a binary counting method of 1177 examined keywords with a minimum threshold of 3 occurrences, resulting in 33 terminologies, as shown in Figure 6. In addition, the accuracy of bibliometric analysis is 0.9069. The largest nodes representing the important nodes of each cluster in the network map are determined as "Neural Network" and "Energy Consumption Prediction" (red), "Deep Neural Network" and "Energy Use" (yellow), "Cluster" and "Electricity Consumption" (green), and finally, "Heating factor," "Climate," and "Residential Energy Consumption" (blue).
Looking closer at the network map in Figure 6, we can see that the 4 clusters are connected. For instance, the "Neural Network" term is connected to "Energy consumption prediction" in the same red cluster, it is connected to "Prediction Model" and "Energy Use" in the yellow cluster, it is connected to "Electricity Consumption" and "Smart Meter" in the green cluster, and finally, it also connected to "Residential Energy Consumption" in the blue cluster. In addition, the term "Cluster" in the green cluster is connected to "Prediction Model" in the yellow cluster, "Energy consumption prediction" in the red cluster, and "Residential Energy Consumption" in the blue cluster. Moreover, the terms "Heating" and "Climate" are connected to "Residential Energy Consumption" in the blue cluster, "Cluster" and "Electricity Consumption" in the green cluster, "Prediction Model" in the yellow cluster, and "Neural Network" and "Energy consumption prediction" in the red cluster.
Finally, by analyzing the network map in Figure 6, we can identify the important terms in each cluster, as follows:

•
In the red cluster: "Neural Network," "Energy consumption prediction," and "Support Vector Machine." • In the yellow cluster: "Prediction Model," "Energy Use," and "Deep Neural Network." • In the green cluster: "Cluster" and "Electricity Consumption." Finally, by analyzing the network map in Figure 6, we can identify the important terms in each cluster, as follows:

Analysis of Representative Manuscripts per Topic
By tackling the posed research questions with our analysis of prior literature, firstly, we should rely on factors that may affect the energy consumption of buildings, such as the energy bills of the occupants of these buildings. Secondly, we should be specifically interested in intelligent computing models able to classify or predict the energy consumption in these buildings accurately. For example, paper S10 focuses solely on finding electricity consumption patterns in buildings, with an accuracy of 89% when predicting the level of consumption, by using K-means clustering. We believe it is necessary to rely on a model that classifies energy consumption with better accuracy and considers other factors that affect this phenomenon. Thirdly, we should seek to build an intelligent model to predict energy consumption efficiently. Papers S11 and S12 used multiple linear regression and multilayer perceptron, respectively, to address this prediction problem, for the case of electricity and natural gas consumption in buildings, also taking into account climate conditions, with an accuracy of 95%. Our conviction is that it is necessary to rely on a model that predicts energy consumption with even better accuracy.
As mentioned, our paper analyzes and discusses selected literature contributions following our research questions.

Analysis of Metrics, Data Sources, and Critical Factors
Our RQ1 drove us to look for metrics, data sources, and critical factors able to influence the energy consumption of buildings. Our review of papers S1 to S41 allowed us to extract such critical factors. In fact, terms such as "electricity," "space heating," and "climate" seem to be highly considered when studying the energy consumption in residential and public buildings, as shown in Figure 8. The major factors of energy consumption in different buildings are shown in Table A1 (see Appendix B). The electricity factor was used by 23% of papers, climate factor by 28%, space heating by 23%, space cooling by 13%, and finally, occupant behavior, by another 13%. By analyzing the factors that are used as inputs in research, we observed four points.

Analysis of Clustering and Classification Techniques
When tackling RQ2, our analysis focused on the clustering and classification techniques of the energy consumption of buildings. To this end, we analyzed papers S11 to S20 and S34 to S41. We observed that K-means clustering seems to be the most popular technique when studying energy consumption in residential and public buildings, as shown in Figure 9. The major clustering and classification techniques in different buildings are shown in Table A2 (see Appendix B). K-means clustering was used by 76% of the papers, and hierarchical clustering by 24%. By analyzing the clustering techniques used in the various research, we observed four points, namely: firstly, the studies S11 and S13 relied on Ward's method of hierarchical clustering to classify energy consumption data in households, where S11 presented a study to analyze the reduction of electricity consumption and improve the energy efficiency of households in the city of Evora, in southern Portugal. This analysis identified 10 clusters of energy consumption. Additionally, S13 presented a study to find the rationale of thermal comfort behaviors in Portuguese households by means of cooling and heating. This study aimed to define daily electricity consumption behavior profiles in households. It also classified families into two basic clusters (active and non-active). In addition, S12 presented a study that implemented data mining in smart meters to define users who are more responsible for the peak system by using consumption variability and a responsibility factor. This study also applied hierarchical clustering and a self-organizing map to find the more responsible consumers in the peak system.
Secondly, the studies S14, S15, S34, S37, and S40 relied on K-means clustering using Firstly, the studies (S18, S28, S32, and S41) used the electricity factor as the only factor in the study, where S18 presented a study to classify consumer behavior depending on the electrical factor in public buildings. Additionally, S28 presented a framework for forecasting hourly energy consumption in residential buildings. In addition, S32 presented a model for forecasting energy consumption in residential buildings based on electricity billing data for the occupants of these buildings. Finally, S41 presented a method for classifying and predicting electricity consumption in residential buildings.
Secondly, S8 presented a study to reduce energy consumption in residential buildings. This study addresses the climate factor only and its impact on energy consumption. It also uses a statistical method to analyze data to help decision-makers in saving energy.
Thirdly, the studies S2, S14, and S22 relied on the consumer behavior in residential buildings as the only factor in the study.
Fourthly, S35 relied on the space heating factor for predicting ECB. Finally, the rest of the research relied on a hybrid of factors such as electricity, climate, space heating, space cooling, gas, and others to classify and predict energy consumption, whether in public or residential buildings.

Analysis of Clustering and Classification Techniques
When tackling RQ2, our analysis focused on the clustering and classification techniques of the energy consumption of buildings. To this end, we analyzed papers S11 to S20 and S34 to S41. We observed that K-means clustering seems to be the most popular technique when studying energy consumption in residential and public buildings, as shown in Figure 9. The major clustering and classification techniques in different buildings are shown in Table A2 (see Appendix B). K-means clustering was used by 76% of the papers, and hierarchical clustering by 24%. By analyzing the clustering techniques used in the various research, we observed four points, namely: firstly, the studies S11 and S13 relied on Ward's method of hierarchical clustering to classify energy consumption data in households, where S11 presented a study to analyze the reduction of electricity consumption and improve the energy efficiency of households in the city of Evora, in southern Portugal. This analysis identified 10 clusters of energy consumption. Additionally, S13 presented a study to find the rationale of thermal comfort behaviors in Portuguese households by means of cooling and heating. This study aimed to define daily electricity consumption behavior profiles in households. It also classified families into two basic clusters (active and non-active). In addition, S12 presented a study that implemented data mining in smart meters to define users who are more responsible for the peak system by using consumption variability and a responsibility factor. This study also applied hierarchical clustering and a self-organizing map to find the more responsible consumers in the peak system. Secondly, the studies S23, S28, and S41 relied on neural networks and other intelligent techniques, where S23 presented a model to predict energy consumption in different types of buildings such as residential, commercial, government, or educational. Their model relied on two machine learning techniques, particularly artificial neural networks and support vector machines. The accuracy in neural networks and support vector machines was 90.1% and 85.4%, respectively. Thus, the authors claim that neural networks are better than support vector machines in terms of accuracy. Additionally, S28 presented a framework to predict the hourly electrical consumption in residential buildings. This study relied on sensor data collected from three residential homes. Machine learning techniques, such as regression models, feed forward neural networks, and support vector regressions, were used. The authors found that feed forward neural networks outperformed the other techniques in terms of Mean Absolute Percentage of Error (MAPE). In fact, achieved MAPE figures were 13.41%, 9.14%, and 9.63%, respectively, in regression, feed forward neural networks, and support vector regression. Finally, S41 presented an approach to predict occupant behaviors of electricity consumption by using a backpropagation neural network and support vector regression. The accuracy of forecasting electricity Secondly, the studies S14, S15, S34, S37, and S40 relied on K-means clustering using "KMeans ++" to classify the energy consumption of buildings, where S14 presented a study to discover residential electricity consumption behaviors over time. It classified electricity consumption into four clusters. S15 presented a study to address electricity consumption factors, such as human activities and air conditioning use. Individual electricity consumption patterns were divided into six clusters, with an accuracy of 89.3%. S34 presented an approach to classify the cooling and heating energy efficiency of residential buildings. The cooling and heating energy were divided into five clusters with an accuracy of 87.8%. S37 presented an approach to classify energy consumption in residential buildings. It classified this into four levels (low, medium, high, and very high) with an accuracy of 85.9%. S40 presented a study to classify electricity consumption data into five levels (very low, low, medium, high, and very high) with an accuracy of 90.4%.
Thirdly, the studies S18 and S41 relied on optimization algorithms with K-mean clustering to classify electricity consumption data in residential buildings. Optimization algorithms were used to determine the initial centroid of K-means clustering, where S18 presented a framework of quadratic programming with K-means clustering to classify occupant behavior-based electricity load patterns in buildings. The results confirmed that there are 10 different clusters in electricity consumption with an accuracy of 83.8%. S41 presented an approach to classify occupant behaviors electricity consumption in buildings. Occupant behaviors were characterized into nine different clusters found in the data. Their study found an accuracy of the genetic algorithm with K-means clustering when classifying occupant behaviors in buildings of 89.7%.
Fourthly, the studies S16, S17, S19, and S20 relied on K-means clustering and other intelligent techniques, where S16 compared two AI techniques (K-means clustering and hierarchal clustering) in the classification of energy consumption in residential buildings. This study relied on smart meter data to show the energy consumption behaviors of occupants. This study suggests that hierarchal clustering outperforms K-means clustering in terms of accuracy, with a figure of 92.8% versus 90.3%, respectively. S17 presented an approach to classify patterns of occupant behaviors in residential buildings. This study relied on two main variables (window opening and indoor temperature) and used two intelligent techniques, namely, time series and K-means clustering. The study confirmed that K-means clustering is better than time series in terms of accuracy, with a figure of 90.20% versus 87.70%, respectively. S19 presented an approach to classify occupant behaviors consumption in residential buildings to determine their energy consumption. This study relied on several factors, such as space heating, refrigeration, and air-conditioning, to reveal the envisaged energy consumption classification. Their approach combined K-means clustering and demographic-based probability neural networks and found 10 different behavior consumption patterns of occupants of residential buildings. The Mean Square Error, MSE, achieved with K-means clustering was 0.09. S20 presented a technique to classify and analyze the ECB based on smart meter electricity data. This study showed how to improve the performance of K-means clustering via time series analysis and wavelets. The proposed approach found 12 different clusters of electricity consumption and achieved MSE in K-means clustering of 0.18.

Analysis of Prediction Techniques
While addressing RQ3, we looked for techniques applicable in predicting the energy consumption of buildings. With this aim, we analyzed papers S21 to S41. We observed that techniques such as neural networks, regression models, and support vector machines are the most adopted, as shown in Figure 10. The major prediction techniques in different buildings are shown in Table A3 (See Appendix B). Neural networks were adopted by 35% of the analyzed papers, whereas support vector machine and regression models were chosen in 22% of papers. Deep neural networks accounted for 17% of the reviewed papers, and the remaining 4% corresponded to the random forest technique. By analyzing the intelligent prediction techniques used in the research, we observed five points, namely: firstly, the studies (S34 and 37) relied on the neural network technique to predict energy consumption in residential buildings, where S34 presented an approach (backpropagation neural network) to predict energy efficiency based on space heating and space cooling with an accuracy of 85.4%. Additionally, S37 presented an approach (feedforward neural network) to predict total energy consumption with an accuracy of 89.2%. consumption in residential buildings by using a convolutional neural network (CNN) and long short-term memory (LSTM). The MSE in linear regression, LSTM, and CNN-LSTM, were 0.40, 0.74, and 0.37, respectively. Thus, CNN-LSTM is better than linear regression and LSTM in terms of MSE. Finally, S27 estimated the ECB as a factor of space heating and cooling, class of the energy consumer, and the average of customer consumption. Their CNN reached an absolute error and a relative error, in their estimates, of 31.83 kWh and 17.29%, respectively.

Analysis of Techniques Combining Classification and Prediction
While answering our RQ4, we searched for combination techniques applicable in predicting and classifying the energy consumption of buildings. With this goal, we analyzed papers S34 to S41 and observed that neural networks with K-means clustering seem to be the most prominent combination, as shown in Figure 11. The major combinations of prediction and classification techniques in different buildings are shown in Table A4 (see Appendix B). By analyzing the combination of intelligent prediction and classification techniques used in the research, we observed three points, namely: firstly, the studies S35 and S38 relied on a hybrid model of K-means clustering with neural networks to estimate energy consumption in different buildings, where S35 presented an approach to predict heating energy consumption. This approach combines radial basis neural networks and K-means clustering to estimate the energy efficiency of buildings. K-means clustering is Secondly, the studies S23, S28, and S41 relied on neural networks and other intelligent techniques, where S23 presented a model to predict energy consumption in different types of buildings such as residential, commercial, government, or educational. Their model relied on two machine learning techniques, particularly artificial neural networks and support vector machines. The accuracy in neural networks and support vector machines was 90.1% and 85.4%, respectively. Thus, the authors claim that neural networks are better than support vector machines in terms of accuracy. Additionally, S28 presented a framework to predict the hourly electrical consumption in residential buildings. This study relied on sensor data collected from three residential homes. Machine learning techniques, such as regression models, feed forward neural networks, and support vector regressions, were used. The authors found that feed forward neural networks outperformed the other techniques in terms of Mean Absolute Percentage of Error (MAPE). In fact, achieved MAPE figures were 13.41%, 9.14%, and 9.63%, respectively, in regression, feed forward neural networks, and support vector regression. Finally, S41 presented an approach to predict occupant behaviors of electricity consumption by using a backpropagation neural network and support vector regression. The accuracy of forecasting electricity load patterns of occupant behaviors in those techniques reached 47.02% and 57.14, respectively, highlighting the superiority of support vector regression compared to backpropagation neural networks for this kind of problem.
Thirdly, the studies S21, S26, S29, and S30 relied on regression models to predict energy consumption in different buildings, where S21 presented an approach to predict the future energy consumption of a supermarket in the UK by using multiple linear regression. The regression equation can interpret about 95.00% of the electricity demand and 86% of the gas use. Additionally, S26 presented a model to predict energy consumption in manufacturing companies. Their study relied on different parameters such as mean outdoor temperature and electricity data. Their approach used multiple linear regression for predicting energy consumption in different temperatures. Their results show that the Adjusted R Square is 0.96. In addition, S29 introduced a statistical approach to forecast total energy consumption in industrial, commercial, domestic, and public buildings. Their study included various factors, such as gross domestic production (GDP), population figures, and GDP per capita. The authors tried a simple regression model and multiple linear regression. By computing an Adjusted R Square, their results show that multiple linear regression (0.991) is better than simple regression model (0.844) in terms of Adjusted R Square. Finally, S30 presented Fourthly, the studies S32 and S40 relied on support vector machines to predict electricity consumption in residential buildings, where the accuracy of the proposed model in S32 and S40 is 95% and 97.4%, respectively.
Fifthly, the studies S24 and S27 relied on deep neural networks to predict energy consumption in different buildings, where S24 presented a new model to predict energy consumption in residential buildings by using a convolutional neural network (CNN) and long short-term memory (LSTM). The MSE in linear regression, LSTM, and CNN-LSTM, were 0.40, 0.74, and 0.37, respectively. Thus, CNN-LSTM is better than linear regression and LSTM in terms of MSE. Finally, S27 estimated the ECB as a factor of space heating and cooling, class of the energy consumer, and the average of customer consumption. Their CNN reached an absolute error and a relative error, in their estimates, of 31.83 kWh and 17.29%, respectively.

Analysis of Techniques Combining Classification and Prediction
While answering our RQ4, we searched for combination techniques applicable in predicting and classifying the energy consumption of buildings. With this goal, we analyzed papers S34 to S41 and observed that neural networks with K-means clustering seem to be the most prominent combination, as shown in Figure 11. The major combinations of prediction and classification techniques in different buildings are shown in Table A4 (see Appendix B). By analyzing the combination of intelligent prediction and classification techniques used in the research, we observed three points, namely: firstly, the studies S35 and S38 relied on a hybrid model of K-means clustering with neural networks to estimate energy consumption in different buildings, where S35 presented an approach to predict heating energy consumption. This approach combines radial basis neural networks and K-means clustering to estimate the energy efficiency of buildings. K-means clustering is used to establish subsets to train individual radial basis function neural networks to improve prediction accuracy. S38 presented an approach to improve energy efficiency in Croatian public buildings. Much like S37, this paper also proposes K-means clustering and a backpropagation neural network to tackle this topic. It also tried to investigate whether K-means clustering enhances the accuracy of a backpropagation neural network prediction. However, their results suggest that K-means clustering, when mixed with backpropagation, has not increased the prediction accuracy of this approach since the backpropagation technique alone achieved 90.1% of accuracy, which compares to 90.4% when combining the two techniques.
Secondly, paper S39 presented an intelligent technique to predict residential load patterns of energy based on socio-economic factors. Moreover, this study used K-means clustering to analyze load patterns and used an entropy-based feature selection method to identify the socio-economic characteristics that impact consumers' energy load patterns. The paper also used a deep neural network to predict the residential load pattern. The proposed technique of this study achieved an MSE of 0.12.
Thirdly, paper S36 presented a hybrid approach to classify and predict energy consumption in residential buildings. This approach consists of two AI techniques, namely, backpropagation neural networks and decision trees. A decision tree is used to classify energy consumption levels, whereas a backpropagation neural network predicts energy consumption in residential buildings. The accuracy in the decision tree and backpropagation neural network is 83.6% and 91.2%, respectively. Finally, the studies S34, S37, S40, and S41 were covered in response to the second and third questions.
Thirdly, paper S36 presented a hybrid approach to classify and predict energy consumption in residential buildings. This approach consists of two AI techniques, namely, backpropagation neural networks and decision trees. A decision tree is used to classify energy consumption levels, whereas a backpropagation neural network predicts energy consumption in residential buildings. The accuracy in the decision tree and backpropagation neural network is 83.6% and 91.2%, respectively. Finally, the studies S34, S37, S40, and S41 were covered in response to the second and third questions. Figure 11. Prediction and classification techniques that used energy consumption in residential and public buildings.

Analysis of Performance Evaluation Metrics
Our literature study also analyzed a variety of performance evaluation metrics in the scope of our RQ5. Results are depicted in Figure 12. In total, 12% of the selected papers (papers S34, 35, 36, 37, and 38) adopted accuracy and precision and recall (ACC&PRE&REC). A total of 22% of them (S15, 16, 17, 18, S21, 22, S32 and S40, 41), selected only ACC. MSE was used by 15% of the analyzed papers (S19, 20, S24, 25, S28, and S39). Only 2% used ACC&MSE (just paper S7). Finally, 10% of the papers adopted the Figure 11. Prediction and classification techniques that used energy consumption in residential and public buildings.

Discussion
This section discusses two important topics: our research questions and some research gaps that we could identify.

Research Question Discussion
Our systematic review aimed to answer five basic questions targeting the application of intelligent techniques in the field of energy consumption in different building sectors.
Focusing on our RQ1, we can highlight the main results obtained with our detailed analysis. In fact, the terms "electricity," "heating," and "climate" seem to be the most relevant when studying the energy consumption of buildings. We also observed a similarity between such results, based on a 41-paper analysis, and the text mining results, which

Discussion
This section discusses two important topics: our research questions and some research gaps that we could identify.

Research Question Discussion
Our systematic review aimed to answer five basic questions targeting the application of intelligent techniques in the field of energy consumption in different building sectors.
Focusing on our RQ1, we can highlight the main results obtained with our detailed analysis. In fact, the terms "electricity," "heating," and "climate" seem to be the most relevant when studying the energy consumption of buildings. We also observed a similarity between such results, based on a 41-paper analysis, and the text mining results, which took into consideration 106 papers. In our text mining approach, we calculated TITs for the terms "electricity" and "heating," giving 0.1 and 0.07, respectively, thus showing their high relevancy. Nonetheless, there is a noticeable difference regarding the analysis of the term "climate." From our detailed analysis of 41 papers, we found "climate" to be relevant. However, from our text mining results, the TITs value for "climate," was 0.02, thus, showing that the "climate factor" has a lower relevance. Finally, we observed that "other factors" ("others" in Figure 12), including "socio-economic," "geospatial," and "building characteristics," along with "electricity," "heating," "cooling," "occupant behavior," "gas," and "climate" are relevant when tackling energy consumption of buildings, as shown in Figure 13. In the bibliometric map, we observed that there is a great relationship between "Residential Energy Consumption" with "Electricity Consumption," "Heating," and "Climate" (see in Figure 7).  For RQ3, our analysis of papers S21 to S41 showed that the terms "backpropagation," "feedforward neural network," and regression models such as "multiple linear regression" and "support vector machine" are the more relevant. Much like our analysis of RQ2, we observed a similarity between such results and the conclusions of text mining. The TITs for terms "neural" and "regression" were 0.06 and 0.05, respectively. Thus, the terms "neural" and "regression" are relevant too. In papers S21 to S41, the neural network technique was used 2.5 times more than regression techniques and 2.143 more times than support vector machines, as shown in Figure 15. In the bibliometric map, we observed that there is a significant relationship between "Neural Network" and "Deep Neural Net- Regarding RQ2, and by examining papers S11 to S20 and S34 to S41, we can conclude that the term "cluster" is important when applying machine learning techniques to the ECB. We observed a relationship between this result and the conclusions drawn from text mining. In fact, TITs for the "cluster" term was 0.1, which shows that this term is highly relevant. However, in our graph of Figure 14, the "cluster" term is further divided into two terms, which are "K-means clustering" and "hierarchical clustering." By analyzing the mentioned papers, "K-means clustering" was used 3.5 times more than "hierarchical clustering," as shown in Figure 13. In the bibliometric map, we observed that there is a significant relationship between "Cluster" with "Electricity Consumption," "Energy consumption prediction," and "Residential Energy Consumption" (see in Figure 7).
For RQ3, our analysis of papers S21 to S41 showed that the terms "backpropagation," "feedforward neural network," and regression models such as "multiple linear regression" and "support vector machine" are the more relevant. Much like our analysis of RQ2, we observed a similarity between such results and the conclusions of text mining. The TITs for terms "neural" and "regression" were 0.06 and 0.05, respectively. Thus, the terms "neural" and "regression" are relevant too. In papers S21 to S41, the neural network technique was used 2.5 times more than regression techniques and 2.143 more times than support vector machines, as shown in Figure 15. In the bibliometric map, we observed that there is a significant relationship between "Neural Network" and "Deep Neural Network" with "Energy use," "Energy consumption prediction," and "Prediction Model." In addition, there is a relationship between "Support Vector Machine" with "Energy consumption prediction" (see in Figure 7).  For RQ3, our analysis of papers S21 to S41 showed that the terms "backpropagation," "feedforward neural network," and regression models such as "multiple linear regression" and "support vector machine" are the more relevant. Much like our analysis of RQ2, we observed a similarity between such results and the conclusions of text mining. The TITs for terms "neural" and "regression" were 0.06 and 0.05, respectively. Thus, the terms "neural" and "regression" are relevant too. In papers S21 to S41, the neural network technique was used 2.5 times more than regression techniques and 2.143 more times than support vector machines, as shown in Figure 15. In the bibliometric map, we observed that there is a significant relationship between "Neural Network" and "Deep Neural Network" with "Energy use," "Energy consumption prediction," and "Prediction Model." In addition, there is a relationship between "Support Vector Machine" with "Energy consumption prediction" (see in Figure 7). Regarding our RQ4, our analysis of papers S34 to S41 showed that K-means clustering was combined with backpropagation and feedforward neural networks in 75% of such papers (namely, in papers S34, S35, S37, S38, S39, and S41).
Finally, regarding our RQ5, accuracy scale is one of the most important metrics used to measure the accuracy of the intelligent model used in the field of energy consumption of different buildings (see Figure 12).

Research Gap Discussion
By reviewing the studies that were mentioned previously, three main problems were covered in our survey.
Firstly, the lack of an official study to find the main factors used in the field of energy consumption in the different building sectors. Moreover, it is unclear how these factors relate to the different applications in the field of energy. For example, studies S18, S28, S32, and S41 used the electricity factor in the ECB as the only factor in the study. In addition, manuscripts S2, S14, and S22 relied on consumer behavior in residential buildings as the only factor in the study. Our study shows that heating and climate factors directly influence the energy consumption of residential buildings, while electricity and climate factors directly influence energy consumption for public buildings.
Secondly, most of the previous studies (S11, S13, S14, S15, S34, S37, and S40) used Kmeans clustering and hierarchical clustering to classify the ECB. Our study shows that two main aspects were not covered in previous research: a) there is a direct relationship between "K-means clustering" and "electricity consumption" in public and residential buildings, b) our paper proposes other classification methods such as a self-organizing map, for comparison with other classification models found in our survey, in terms of Regarding our RQ4, our analysis of papers S34 to S41 showed that K-means clustering was combined with backpropagation and feedforward neural networks in 75% of such papers (namely, in papers S34, S35, S37, S38, S39, and S41).
Finally, regarding our RQ5, accuracy scale is one of the most important metrics used to measure the accuracy of the intelligent model used in the field of energy consumption of different buildings (see Figure 12).

Research Gap Discussion
By reviewing the studies that were mentioned previously, three main problems were covered in our survey.
Firstly, the lack of an official study to find the main factors used in the field of energy consumption in the different building sectors. Moreover, it is unclear how these factors relate to the different applications in the field of energy. For example, studies S18, S28, S32, and S41 used the electricity factor in the ECB as the only factor in the study. In addition, manuscripts S2, S14, and S22 relied on consumer behavior in residential buildings as the only factor in the study. Our study shows that heating and climate factors directly influence the energy consumption of residential buildings, while electricity and climate factors directly influence energy consumption for public buildings.
Secondly, most of the previous studies (S11, S13, S14, S15, S34, S37, and S40) used K-means clustering and hierarchical clustering to classify the ECB. Our study shows that two main aspects were not covered in previous research: a) there is a direct relationship between "K-means clustering" and "electricity consumption" in public and residential buildings, b) our paper proposes other classification methods such as a self-organizing map, for comparison with other classification models found in our survey, in terms of accuracy.
Thirdly, most of the previous studies used four basic intelligent computing models to predict the ECB: neural networks (S23, S28, S34, S37, and S41), regression (S21, S26, S29, and S30), support vector machines (S32 and S40), and deep learning (S24 and S27). Our study shows that there is a direct relationship between "neural networks," "support vector machines," and "prediction of energy consumption in residential buildings." Additionally, there is a direct relation between "deep learning" and "prediction of energy consumption in public buildings." Our study proposes the use of recent literature techniques, such as recurrent neural networks, for comparison with the prediction models found in our survey.
Additionally, our research allowed us to identify several research gaps. We found that only a small number of papers (S1 and S6) address specific factors influencing the energy consumption of buildings. For example, some studies (S5, S11, S18, S21, S26, S28, S32, and S41) focused on the electricity factor in general, with no mention of the number of building occupants or the activities carried out by them. Only a few studies (S31, S33, and S38) related to the energy consumption in public buildings. However, stakeholders in public buildings, particularly in Portugal, find this topic relevant and are not only willing to improve the energy efficiency in those buildings but also interested in switching energy suppliers whenever the market conditions favor such change [13]. The results of this systematic review also displayed a wide gap in the domain of intelligent computing models, particularly regarding the automatic classification and prediction of the ECB, since the number of available machine learning techniques in the state-of-the-art is vast, and we saw from our survey, that some of the most promising techniques are not yet being used to their full potential. In fact, only a few studies (S24, S27, S31, and S39) address the application of the deep neural network model, which is a promising technique for predicting the ECB.

Study Limitations and Threats
Our survey has several limitations. Notably, it was limited by the search keywords chosen and the time interval of the publications (last seven years). In addition, it used a finite number of electronic database sources. Furthermore, this paper only analyzed English manuscripts, and we cannot guarantee to have selected all the obtainable and valuable material for our review.

Conclusions and Future Work
This paper introduced a systematic literature review on the topic of classifying and predicting the ECB, focusing on finding answers to our five research questions. Text mining procedures were used to find the most used terms in the energy and intelligent computing model domains, and a bibliometric map was used to find the relationships between the most common terms in those domains prior to a more detailed manuscript analysis. By following a PRISMA approach in our survey, we started by identifying 822 manuscripts and ended up analyzing 41. Our survey highlighted the most used intelligent computing models, notably machine learning methods, adopted by the community to classify and forecast the ECB. This study provides contributions in three aspects. The first one considers factors that influence the ECB. The second one provides a systematic survey of classification and prediction techniques used in that context. The last aspect tackles the evaluation criteria used by those techniques.
As mentioned, the study has not covered all manuscripts in 2021, which may contain new intelligent models. The emergence of new intelligent methods may help improve the accuracy of classification models and predict energy consumption for different building sectors.
Thus, there are still opportunities for improvements regarding our topic of research. As a recommendation for future work, there are some other factors that affect energy consumption in buildings (e.g., green roof, building envelope, internal and external factors). These factors may be used in the future for the classification and prediction of energy consumption. Our survey suggests tackling the classification of the ECB by combining clustering and optimization techniques aiming to classify the ECB to levels (low-mediumhigh). As for predicting the ECB, this study suggests adopting machine learning approaches from the family of deep learning techniques, such as long short-term memory, convolutional neural networks, and deep forest, which are some of the recent trends found in research. Funding: This work has been supported by a NOVA IMS PhD Scholarship and its scope lies in the context of Simplex #109 "Consumo SMART" https://www.simplex.gov.pt/medidas.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data accessed upon request from any of the author.

Acknowledgments:
We would like to thank Maria Anastasiadou for their help in reviewing the structure and content of the paper. The authors would like also to thank the editorial team and the reviewers who provided constructive and helpful comments to improve the quality of the article.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
This section shows the steps required to build a corpus tool that finds the word frequency in manuscripts. In this algorithm, we first included all manuscripts in a single "Documents" file as well as two "Dictionaries" (line 1). Then, from line 3 to line 10, we imported a few programming libraries and executed the following procedures in order to build our corpus: • Line 3 (import re): we imported a library for regular expression operations. • Line 4 (import nltk): we imported a toolkit for natural language processing. • Line 5: we computed SW = Stop Words. • Line 6: we computed £ = the word frequency for all words in α ("Documents"). • Line 7 (from nltk. corpus import SW): we computed corpus = a large and structured set of texts and SWs. • Lines 8 and 9: we removed the morphological affixes of words from the corpus leaving only the word stem. • Line 10: we recorded the frequency of each word in α ("Documents").
We continued by implementing a cleaning process by deleting from α symbols, numbers, and extra spaces (line 11) and transforming all words to lowercase (line 12). Line 13 converted sentences to separate words. Line 14 removed stop words such as "the" and "is" and those that use stemming. That is, we reduced variations of the form of a given word by deleting inflection forms through the removal of unnecessary characters, such as in the following example:    Reduces energy Reduced energy Reducing energy −→ Reduce energy Lines 15 and 16 found common words between the main document file and the two dictionaries. Line 17 computed the word frequency in the main document file. Line 18 created a mapping between word frequency and the intersection words (that are common across "Document" and "Dictionary") to find optimal keywords that were used in selecting Energies 2021, 14, 7810 26 of 31 significant research papers for our survey. Finally, in lines 19 and 20, we determined the importance of each word, found optimal keywords, and visualized a word cloud ( Figure 3) and a word offset (Figure 4). In this last one, we depicted the location of a word in a sequence of text sentences.
Algorithm A1: Build a corpus module to find word frequency in manuscripts.

1.
Input: α = (Documents number of manuscripts) ¥ = (Dictionary two dictionaries of intelligent model and energy consumption of buildings) 2.