The Evolution of Knowledge and Trends within the Building Energy Efﬁciency Field of Knowledge

: The building sector is responsible for 50% of worldwide energy consumption and 40% of CO 2 emissions. Consequently, a lot of research on Building Energy Efﬁciency has been carried out over recent years, covering the most varied topics. While many of these themes are no longer of interest to the scientiﬁc community, others ﬂourish. Thus, reading trends within a ﬁeld of knowledge is wise since it allows resources to be directed towards the most promising topics. However, there is a paucity of research on trend analysis in this ﬁeld. Therefore, this article aims to analyse the evolution of the Building Energy Efﬁciency ﬁeld of knowledge, identifying the recurrent themes and pointing out their trends, supported by statistical methods. Such an analysis relied on more than 9000 authors’ keywords collected from 2000 articles from the Scopus database and classiﬁed into 30 topics/themes. A frequency distribution of these themes enabled us to distinguish those most published as well as those whose academic interest has cooled down. This ﬁeld of knowledge has evolved over three distinct phases, throughout which, eight themes presented an upward trend. These ﬁndings can assist researchers in optimising time and resources, investigating the topics with growing interest, and possibilities for new contributions.


Introduction
Energy consumption has increased over the last decades [1]. Such an increase brings several concerns, such as the necessity to develop alternative energy sources and reduce environmental impacts due to greenhouse gas emissions [2,3]. The building sector has overtaken the industrial sector and has played an important role in such a scenario, being responsible for more than 50% of the total energy consumption and 40% of the total CO 2 emissions [4]. In Europe, according to the European Commission [5], only residential buildings are responsible for approximately 40% of energy consumption and 36% of CO 2 emissions. Therefore, it was necessary to find a way to decrease such energy consumption without affecting economic development, as well as the comfort of the building's occupants [6,7]. The way that researchers found to achieve such a goal was to increase the efficiency of processes and products [8,9]. This put the building sector on the target of important public policies.

Materials and Methods
The deductive method was the methodological approach chosen to carry out this research. This was well-explained by Davidavičienė [105]. The deductive method starts with a set of theory-driven research questions, which guide the data collection and their analysis. Such questions were precisely posed in the previous section.
The analyses carried out in this research were based on the authors' keywords of a significant sample of relevant articles addressing Building Energy Efficiency (BEE). Therefore, this section will describe the procedures used to collect and manipulate such keywords. Figure 1 illustrates the methodological flow of this research.

Research Problem Formulation
This is the stage in which the general orientation of the research was established. In this step, the research theme was formally defined and well delimited. Furthermore, the gaps in the field of knowledge under investigation were stated and the questions that had to be answered in order to fill such a gap were posed [106].
Energy Efficiency is a huge field of knowledge with several intertwined branches. Building Energy Efficiency (BEE) is one of these branches, which constitutes several themes.
Since the building sector is a great energy consumer, overlapping the industrial sector, much has been written about Building Energy Efficiency. However, no article has studied, in a methodological manner, the evolution of this field of knowledge, the themes within it, or how they relate to each other. Thus, in order to fill this gap, it was necessary to answer the research questions as posed in Section 1.

Document Retrieval
The main goal of this step relied on retrieving a significant sample of publications addressing relevant issues in the Building Energy Efficiency field of knowledge. In order to accomplish such a goal, it was necessary to define the database from which the documents would be gathered and, most importantly, design the proper query to get the job done.
The publications were retrieved from the SCOPUS database since it has a wide coverage of high-impact journals, and "it is the largest database of abstracts and citation literature peer review" [107]. Only journal articles from 2000 to 2018 were considered because, before then, only a few articles presented the authors' keywords, which provided the background of the method employed. The construction of a query is a process of association of several terms concerned with a core theme. Such terms could be the keywords of the first sample of articles related to the field of knowledge under analysis. A very simple query can be used to capture the articles in which keywords will be used, to formulate the ultimate query.
In this case, the term "Building Energy Efficiency" was used to formulate the very first query, retrieving 893 publications and resulting in more than 3000 keywords, illustrated as a word cloud in Figure 2.  Figure 2 is a graphical representation of the word frequency that greater prominence to the key terms that appeared more frequently in the 893 publications previously gathered. The larger the word in the visual, the more common the word was among the keywords. Thus, words like building, energy, efficiency were prevalent, whilst others like pattern, technical, measuring were incidental. Therefore, not all 3000 keywords fit with the current research. Thus, after examining these keywords, only 682 of them were considered suitable to the scope of this research and were used in the construction of the ultimate query, as shown in Figure 3.   Figure 3 illustrates just a fragment of the query used to collect the articles scrutinised by this research. As can be seen at the beginning of the fragment, were selected articles containing one or more of the key terms in its title and/or abstract. At the bottom of the Figure 3 the type of source selected (only articles) and the year of publication can be seen.
The search using this query returned more than 14,000 articles, indexed on the Scopus database. However, most of them have never been cited and, as the number of citations is recognised as a quality standard [106,108], only the documents cited five or more times were considered eligible for further analysis. Thus, the number of selected articles dropped to nearly 2000.

The Evolution of the Building Energy Efficiency Field of Knowledge
According to Price's Law [109], every field of knowledge is a dynamic structure that evolves over time, with a few publications covering distinct themes within it, until maturity when it is consolidated and there are only a few left to explore. Thus, it is important to investigate which stage of development this field has reached.
Price [109] stated that the number of publications related to a field of knowledge can be used to characterise its evolution. Indeed, scientific publications meet the scientific community's demand for new finds. Thus, it is fair to say that scientific productivity is directly associated with scientific community interest in themes within this field. Based on that, Price [109] established a law, which states that the several stages of evolution of a field of knowledge can be characterised by the growth of the number of publications over the years.
Based on that, the evolution of the Building Energy Efficiency field of knowledge was assessed by means of Price's fundamental law.

Authors' Keyword Collection, Classification, and Manipulation
This step shows the data structure used to store 9326 keywords, collected from the 2000 articles that sourced this research; the classification procedure of these keywords into themes; and the data manipulations necessary to feed the following step.
Initially, the keywords and the respective articles were stored in a matrix, as illustrated in Table 1. Each row of Table 1 represents one of the 2000 articles (a 1 , a i , a 2000) ). Each row stores the metadata for the articles (article title, authorship, publication date, and keywords). Special attention was given to the keywords, which number varies from article to article. For data processing purposes, each keyword is represented as kw j i meaning the jth keyword from the ith article.
The keywords from Table 1 were screened, looking for those relevant to Building Energy Efficiency, as well as for insights about their classification into groups. As a result, 7598 keywords were discarded and the remaining 1728 were grouped into 30 distinct themes, as illustrated in Table 2.   Table 2 classifies the keywords, stores in Table 1, into thirty categories, or themes (the number of categories was defined a priori). Each column of Table 2 is assigned to a theme, and each theme has a different number of keywords assigned to it.
The task of accounting for the number of times a given theme appears in the literature was made easy by combining Tables 1 and 2, which resulted in Table 3.  Table 3 is very similar to Table 1, except for the fact that in Table 1 each article is associated which its own keywords, whilst in Table 3 each article is associated with one, or more, the themes previously defined. Thus, Table 3 can be read in two ways: by row or by column. When read by row, it is possible to see the themes addressed by each article. When read by column, it is possible to see which theme is present in which article. Thus, the presence of a theme in the literature in a given year can be assessed by counting the number of articles addressing such a theme that year. Therefore, in order to build a distribution of themes over a given period, it was enough to restrict Table 3 to articles published over such a period.

Evolution and Trend of the Themes
The participation of a theme within a given period can be assessed by the percentage of the articles published within this period that address such a theme, as shown in (1).
Theme i % = #articles adressing theme i total articles published in the year.
(1) Thus, the higher Theme i %, the more important is the theme. Based on this, it was possible to study the evolution of the themes over time.
The evolution of a theme was defined as the participation of such a theme in the literature over the years covered by the analysis. Thus, if the participation of a theme increases over time, it can be said that the theme shows an upward trend. Conversely, if the participation decreases, the theme shows a downward trend. There are also occasions in which the trend is stable over the years.
In order to avoid subjectivity, the trend analysis was supported by a nonparametric statistical procedure, called the Mann-Kendall test for trend [110]. Table 4 shows a fragment of a table used to show the evolution and trend of the themes.   Table 4. Annual participation of the themes, followed by their trends. Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is  Table 4. Annual participation of the themes, followed by their trends. Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is marked by the flow of knowledge between several themes or subareas of this field. That  Table 4. Annual participation of the themes, followed by their trends. Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is marked by the flow of knowledge between several themes or subareas of this field. That  Table 4. Annual participation of the themes, followed by their trends. Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is marked by the flow of knowledge between several themes or subareas of this field. That is why it is important to study the relationship between the themes.  Table 4. Annual participation of the themes, followed by their trends. Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is marked by the flow of knowledge between several themes or subareas of this field. That is why it is important to study the relationship between the themes.
Yet according to Sun and Latora [111], the knowledge flows with more intensity be-  Table 4. Annual participation of the themes, followed by their trends. Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is marked by the flow of knowledge between several themes or subareas of this field. That is why it is important to study the relationship between the themes.
Yet according to Sun and Latora [111], the knowledge flows with more intensity between synergetic themes, and the most influential themes are those most interconnected,  Table 4. Annual participation of the themes, followed by their trends. Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is marked by the flow of knowledge between several themes or subareas of this field. That is why it is important to study the relationship between the themes.
Yet according to Sun and Latora [111], the knowledge flows with more intensity between synergetic themes, and the most influential themes are those most interconnected, around which the field develops itself. The relationship between the themes can be repre-  Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is marked by the flow of knowledge between several themes or subareas of this field. That is why it is important to study the relationship between the themes.
Yet according to Sun and Latora [111], the knowledge flows with more intensity between synergetic themes, and the most influential themes are those most interconnected, around which the field develops itself. The relationship between the themes can be represented by means of an abstract two-dimensional plot resulting from a multidimensional  Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is marked by the flow of knowledge between several themes or subareas of this field. That is why it is important to study the relationship between the themes.
Yet according to Sun and Latora [111], the knowledge flows with more intensity between synergetic themes, and the most influential themes are those most interconnected, around which the field develops itself. The relationship between the themes can be represented by means of an abstract two-dimensional plot resulting from a multidimensional scaling [112].  Table 4 presents the evolution of each theme over the period under analysis, both in figures and graphically, as well as their trends according to the Mann-Kendall test with 5% of significance. The rows represent each of the thirty themes and, the columns store the participation of each of them in the literature calculated according to Equation (1). A bar graph illustrates the calculations, whilst ↑ means an upward trend, ↓ means a downward trend and ↔ means no-trend. Table 4 can also be read from the point of view of the columns, i.e., from the point of view of the years. By doing that, it was possible to establish the profile of the years based on the themes addressed by the articles published during these years. This opened up the possibility of comparing the years and identifying certain patterns that allowed them to be grouped into clusters, demarcating evolutionary stages of this field of knowledge.

Interrelationships between Themes
According to Sun and Latora [111], the development of a field of knowledge is marked by the flow of knowledge between several themes or subareas of this field. That is why it is important to study the relationship between the themes.
Yet according to Sun and Latora [111], the knowledge flows with more intensity between synergetic themes, and the most influential themes are those most interconnected, around which the field develops itself. The relationship between the themes can be represented by means of an abstract two-dimensional plot resulting from a multidimensional scaling [112].

Results
The following section presents the outcomes of the complete analysis carried out to answer the research questions.

The Evolution of the Building Energy Efficiency Field of Knowledge
According to Price's Law [109], the scientific production concerned with a field of knowledge grows exponentially until it reaches a point of inflection and, afterwards, a threshold value around which it stabilises, meaning that this field has reached its maturity. The aspect of the curve that represents the evolution of publications goes from exponential to logistics, signalling that the scientific community's interest in this field has cooled down.
According to Dabi et al. [113]: "The main hypothesis of Price's law is that the development of science follows an exponential growth. The growth of a scientific domain goes through four phases". The first phase is the precursors' phase. According to Dabi et al. [113] "during this phase only a small number of researchers begin publishing". The second phase is the proper exponential growth. "During this phase, the expansion of the field attracts many researchers as many aspects of the subject still have to be explored" [113]. In the third phase, the body of knowledge is consolidated and the growth of scientific production becomes linear [113]. The next phase, according to Dabi et al. [113], "corresponds to the collapse of the domain and is marked by a decrease in the number of the publications". The aspect of the curve transforms from exponential to logistical, reaching a ceiling value after passing through an inflection point. Therefore, in order to perform the Price's Law analysis, the frequency distribution of the publications addressing BEE is presented in Figure 4. Figure 4a shows the number of publications on a yearly basis, whilst Figure 4b shows the cumulative version, on which compliance with Price's Law is investigated.
The first phase roughly extends to 2005. The second phase is from 2005 to 2014. The number of publications fits well with an exponential function since the statistic R 2 is very close to 1.00. The third phase extends from 2014 to 2018. The growth of scientific production becomes linear (R 2 = 0.988). There is no statistical evidence that an inflection point has been reached yet. It is worth mentioning that only articles with 5 or more citations were considered and it is well known that the older the article, the more cited it is. Thus, it is likely that the number of articles during the later years will increase, reinforcing the linear trend of the plot for the final years even more. Therefore, the maturity of this field of knowledge has not yet been reached, leaving several aspects to be explored.
In the majority of cases, the classification of a keyword into a given category was straightforward, like 'green building' (classified under the Green Building theme or group) for instance. However, there were cases in which a keyword could be coded into more than one theme. In such cases, the classification demanded some extra work. It was necessary to read the title and abstract and, in some cases, the introduction of the articles from which the keyword was collected, to decide which theme it fitted best.
A keyword was classified into a unique theme but a theme could cluster several keywords with similar meanings, in such a way that each theme represents a homogenous group. It is worth mentioning that an article can have keywords classified into different themes.

Associating the Articles with the Themes
Once the keywords were classified into themes, the next step was to associate the 2000 articles captured for this research, with the themes. Some articles addressed only one theme, while others addressed more than one. The presence of a theme in a given period was used as a measure of its relevance and it was estimated by counting the number of articles in which the theme appeared during a period.

Evolution and Trend of the Themes
Before studying the evolution and trend of the themes it is worth discussing their relevance over the period under investigation.
The relevance of a theme can be derived from the number of articles that address it over the period considered [111]. Thus, Table 5 presents the themes ranked according to their relevance.  Table 5 shows the absolute number and percentage of articles addressing each of the thirty themes. Therefore, it can be seen that the three largest themes are BEM, DAT, and BIM, which are present in more than 54% of the articles captured for this research. Eleven themes are addressed by less than 4% of the articles, meaning that the interest in them is small, so that they will be neglected for further analysis (grey background). However, it is worth mentioning that some of them are indirectly of interest for the ZEB and BRF themes. The themes BMS, BEV, EPS, EST, LIG, RNE, THS, and WIN could be still focused on the recent research under the umbrella of other themes with increasing. Table 5 also shows the interdisciplinary character of the research carried out in the BEE field of knowledge. For instance, from the 2000 articles collected for this study, 505 (25.3%) address the theme BEM and the other 29 themes. According to Sun and Latora [111], such interaction can reflect the exchange of knowledge across themes. It is possible to infer that the strength of such an interaction depends on the number of publications sharing the themes. Table 5 provides a static view of the BEE field of knowledge. It shows the most relevant themes within the field but it does not show the evolution and trend of each theme. Thus, Table 6 presents the trend of each theme, allowing investigation as to whether a given theme has a perennial presence or is just incidental in the literature. A theme can be analysed as to when it emerged, if it is still active or vanished, and when its apogee was. Table 6 presents the annual participation of each theme in the literature, summarising their trend in the last column.
Eight themes are in an upward trend: BAC, EMS, DAT, BEM, BIM, OCB, BRF and ZEB. It can be seen that the themes BAC, EMS and DAT reached a maximum in the early 2000s, while the others peaked in the late 2010s. The development of the internet and image processing software packages explain the remarkable growth of the theme BIM [112,114]. Once the stock of old buildings far surpasses the stock of new buildings everywhere in the world, the only way to achieve the current energy-saving standards is by retrofitting them, which explains the growing interest of the scientific community in the BRF theme. The raising of the theme OCB can be explained because the scientific community has realised that the success of energy-efficient projects are significantly influenced by human factors [115,116]. Table 6. The annual relative frequency of articles that address each of the thirty themes.           Since there are many consecrated statistical methods, which have been waiting for the development of informatics to become popular, it is expected that DAT will keep growing for a while, even within other fields of knowledge. According to Cristino et al. [108] the data analysis techniques mentioned by the papers within this field of knowledge can be roughly clustered into the following categories: regression analysis, descriptive statistics, multivariate analysis, computational intelligence, inferential statistics, and design of experiments.
There is no statistical evidence of a particular trend for SMB, TOB, GRB, HVAC, LCA, THC, BIP, REG, ENV and SMG.
The theme SUS shows a downward trend. The themes concerned with environmental issues (ENV, SUS and LCA) reached their maximum in the second half of the 2000s and have decreased since then, showing that interest in these subjects cooled down.
The volume of publications addressing each theme, as well as the interaction between them, defines the evolution of a field of knowledge. As these variables change over time, it is possible to infer that such an evolution is marked by distinct phases. Thus, the next step in this study is to identify such stages.

Stages of the Evolution of This Field of Knowledge
The evolution of a field of knowledge is marked by a sequence of periods with a similar profile of publications. Thus, reading Table 6 from the columns' point of view, it is possible to see the profile of the years according to the themes published and look for a pattern.
One of the ways to identify similarities between multivariate observations is to apply clustering techniques [112,117]. Thus, the space of the columns in Table 6 was submitted to a hierarchical clustering algorithm, leading to the dendrogram presented in Figure 6. It is worth mentioning that a dendrogram is a tree diagram that shows hierarchical relationships between similar objects [118], which, in this case, are the years.
Therefore, the dendrogram shows in Figure 6 two well-defined clusters. One of these clusters groups the years 2007-2011, at a similarity level of 66.7, and the other, the years 2012-2018, at a similarity level larger than 80. The years ranging from 2000 to 2006 are very heterogeneous. This suggests that the period covered by this research could be divided into three phases. Figure 7 shows the profile of each of these phases. Figure 7 presents the annual participation of the themes in the literature for the three evolutionary periods determined by the cluster analysis.
During the first period (2000)(2001)(2002)(2003)(2004)(2005)(2006), the scientific community's gaze was scattered over 26 themes, differently distributed over the whole period. In 2000, ten themes were The participation of the themes in the literature varied over the years. In 2000, ten themes shared the same participation in the literature (10%); in 2002, the theme BEM stood out (24%); in 2003 two themes were highlighted, GRB and HVAC with 19%; in 2004, other two themes stood out, but this time, with 12% of participation (DAT, REG); in 2005, the theme DAT increased its participation to 25%, and, in 2006 the theme BEM stood out with 17% of participation in the literature.
The low number of themes in 2001 is due to the fact that only the articles that reached five or more citations were considered, which leads to the conclusion that the production of articles addressing the Building Envelope was the most consistent in 2001.
Therefore, it can be seen that the evolution of this field of knowledge over this period did not exhibit any pattern. The second period (2007-2011) is the shortest of the three periods (five years). It presented more themes consolidated than the previous one. Twentynine themes had been explored over this period  It was in this period that the themes BIP, ENV, REG and THC reached their greatest participation in the literature. However, the theme BEM was by far the one most present in the literature, closely followed by DAT. The participation of the themes BAC, BRF, EMS, OCB, SMG and ZEB had a neglectable participation in the literature over this period, while the participation of the themes BEV, BMS, GRB, HVAC, SMB, SUS and TOB shrank.
In the third period (2012-2018), all the thirty themes had been explored, 29 themes in 2012, 2013 and 2015; 28 themes in 2014; 30 themes in 2016, 2017 and 2018. Thus, it can be said that the scientific interest in this field of knowledge increased even more over this period.
The participation of the themes BEM, BIM, BRF, DAT, OCB and ZEB had increased and, according to statistical analysis, they are in an upward trend. participation of other themes like BAC, EMS and SMG had increased as well, but not enough signalise an upward trend. The interest for the themes ENV, GRB, HVAC, REG and SUS had decreased. The other themes remained stable.

Interrelationships between Themes
According to Sun and Latora [111], the interaction between themes within a field of knowledge reflects the flow of knowledge between the sub-areas of this field. Thus, in order to understand the evolution of this field, it is fundamentally important to define and study the interaction between the themes.
Many articles address multiple themes at once. What indicates an interaction between themes? The interaction between the themes i and j can be assessed by means of Equation (2).
where N ij is the number of articles that concurrently address the themes i and j, and N p is the number of articles for the considered period (N 1 = 149, N 2 = 342 and N 3 = 1509).Thus, λ ij is the percentage of the articles produced during the period under investigation that addresses the themes i and j. Figure 8 presents a graphical representation of the model used to account for the interactions between themes. Based on Figure 8, it can be seen that λ ij can be stored in a symmetric matrix, called interaction or interrelationship matrix. According to Equation (2), such a matrix varies depending on the evolutionary period. Figure 9 shows the interrelationship matrix for each period. The darker the fill colour, the greater the interaction between themes i and j.
Observing the matrix for the first period, it can be said that, during this period, this field of knowledge was driven in large part by themes concerned with sustainable development and thermal comfort. Also, it can be noticed that the greatest interaction occurred between HVAC-THC and LCA-SUS. It is possible to observe the emergence of the relationship between the themes BEM-DAT, which would increase until the end of the third period.
During the second period, the interest of the scientific community revolved more around the interaction between BEM-DAT; BEM-HVAC; GRB-SUS; HVAC-TOB and HVAC-THC.
The interaction between BEM-DAT is remarkable; it is by far the largest one, not only over the third period, but over the whole period covered by this research. Therefore, these two themes have been the great engine for developing the research on Building Energy Efficiency. Since it is difficult to analyse and understand the interaction between the themes only by examining the interrelationship matrices in Figure 9, a visual representation of such matrices is valuable. Such a representation can be obtained by means of a data analysis technique known as multidimensional scaling [119], which allows the representation of the interrelationship of the themes in an abstract, two-dimensional Cartesian plot, as illustrated in Figure 10. Although such a representation is not absolutely perfect, it gives some insight into the interaction between themes. For instance, the greater the interaction between themes, the closer they are in the plot, forming clusters of synergetic themes. In other words: the closer the themes, the greater the flow of knowledge between them. The left side of Figure 10 shows the participation of the themes over the three evolutionary stages. On the right side, three plots represent the interrelationship matrix between the themes for each of the three evolutionary stages.
The clusters shown in Figure 10 only include the themes for which the λ ij >1.0. The distance between clusters and elements was assessed according to the nearest neighbour strategy [118].
Observing the plots for the three evolutionary periods, it should be noted that the themes have clustered around the origin of the plot as this field evolved. In general, although it can distort the representation of the themes in the plot, the more central a theme, the greater the interaction with the others.
In the first evolutionary period of this field of knowledge, significant interaction between themes related to thermal comfort (THC-HVAC), themes concerned with environmental/sustainability issues (SUS-ENV-LCA-GRB-SMB), and themes addressing modelling and data analysis techniques (BEM-DAT) can be seen.
A number of clusters dropped from the first to the second evolutionary period. The cluster BEM-DAT remained and came closer to the centre of the plot. They are cross themes. Some articles are devoted to revisiting a given theme and have an interest in comparing the results emerging from different data analysis techniques, in such a way that the modelling and data analysis become the kernel of the paper instead of being tools by means of which better results can be achieved. Such articles give little attention to the aspects concerned with Building Energy Efficiency, which are only the background and data source, while their main purpose is data analysis.
Still, within the second period, the themes LCA and SMB leave the environment/sustainability cluster because of the lack of interaction with the other themes. The interaction of the remaining themes with the thermal comfort cluster increased, resulting in the formation of a new cluster.
The cluster BAC-EMS was extinct by this stage. Since the participation of both themes in the literature increased over this period, it is fair to assume that both themes developed in isolation, without sharing knowledge.
The number of isolated themes in this period was the largest amongst the evolutionary stages. Thus, it can be said that, during this period, the exchange of knowledge was the smallest.
The third stage is the one with the largest number of clusters and the smallest number of isolated themes. It can be considered the period with the greatest flow of knowledge between sub-areas within this field of knowledge.
The clusters THC-HVAC and BAC-EMS, from the first evolutionary stage, have been re-established, meaning that the themes within each cluster restarted, triggering knowledge production in each other.
The cluster BEM-DAT is even closer to the centre of the plot in this stage. According to the interaction matrix for the third period, in Figure 9, this cluster interacts with all the themes (λ ij > 1.0) except the themes BIP and REG.
The cluster concerned with environment/sustainability in the first period was broken into three small clusters (LCA-ENV, SMB-SMG, and SUS-GRB-BIM), suggesting the exchange of more specialised knowledge. The flow of knowledge between themes related to sustainability and information modelling is noteworthy. As the latter theme shows an upward trend, it is quite possible that its development increases the knowledge production of themes related to green buildings and sustainability.
The themes BIP, TOB, REG, ZEB, BRF, and OCB developed in isolation over all three evolutionary stages. The latter three are in an upward trend, according to the trend analysis previously presented. Thus, a clear relation between trend and evolutionary development of a theme within the Building Energy Efficiency field of knowledge could not be seen.

Conclusions
After analysing 2000 articles concerned with Building Energy Efficiency, this paper shows that this field of knowledge has not yet reached maturity. Thus, much remains to be studied, meaning that investment in research is still needed.
This research identified thirty recurrent themes within this field of knowledge. However, only nineteen of these themes are statistically significant. According to the Mann-Kendall trend test, eight out of these themes show a clear upward trend, one a downward trend, and ten do not show any clear evidence for a particular trend.
This study shows that the evolution of this field of knowledge passed through three stages, whose dynamics were clearly explained, as well as the changes in the patterns of cross-fertilisation.
This research shows that energy modelling, along with data analysis techniques, have been influencing this field of knowledge since its beginning and they have been instigating production in other areas within this field. Therefore, themes like Building Energy Modelling and Data Analysis Techniques are in an upward trend and still very far from maturity, constituting good research opportunities.
The scientific community's gaze is on other themes with low connections, like Occupancy Behaviour, Building Information Modelling, Zero Energy Buildings, and Building Retrofitting. All of these themes have increased in importance and seem to be new frontiers of this field of knowledge.
Considering the Occupancy Behaviour, topics like eco-feedback, gamification, behaviour, and advanced building automation systems have not been adequately addressed.