Meta-Analysis of Price Premiums in Housing with Energy Performance Certificates (EPC)

Studies have found that housing with energy performance certificates have a positive premium in sales price. However, other studies have obtained negative or unexpected results. The objective of this study is to determine whether or not housing with energy performance certificates (EPC) have positive premiums in the sales price. For this purpose, a systematic review, meta-analysis, and meta-regression of prior studies were conducted in order to determine whether the existence of an EPC influences sales price. A total of 66 documents were examined, with a total of 173 sales registers. The impact of having or not having an EPC was analyzed for housing sales price premiums on a global level, as well as the premiums in Europe for each of the ABCDEFG qualification letters. The results suggest that: 1) Globally speaking, it is estimated that housing with an EPC has an overall price premium of 4.20%, on a continent level, with premiums of 5.36% being obtained in North America, 4.81% in Asia, and 2.32% in Europe; 2) in Europe, the results are not conclusive with regards to the ABCDEFG qualification, since there is no consensus as to the letter base to be used as a reference for comparisons, thereby generating small comparable samples.


Introduction
Over recent years, a global increase has taken place in energy consumption, highlighting the foreseeable depletion of energy resources [1]. Energy efficiency (hereinafter, EE) has the goal of reducing consumption by making appropriate use of energy. This growing environmental concern has resulted in policies affecting distinct sectors, such as the automobile, industry, and construction industries. In Europe, these policies have led to the implementation of an energy performance certificate (hereinafter, EPC) in buildings, assigning them a ABCDEFG qualification, as is done with household appliances, so as to differentiate between the more efficient ones, assigned the letter A, from the less efficient ones, given the letter G.
In construction, sustainability has translated mainly into systems of assessment, classification, and certification, with these latter offering the so-called EPC. Distinct types of qualification and certification exist on a global level, such as: ABCDEFG qualification in the European Union; BREEAM in the United Kingdom, LEED in the U.S.; Green Mark in Singapore, etc.
Many documents have empirically revealed that a sales price premium exists for housing with an EPC. However, the relationship and the size of this premium have yet to be unanimously accepted. The heterogeneity of the results found in the literature may be due to the distinct geographic locations of the studies, sample sizes, etc. LEED certification was developed in 1993 in the U.S., by the Green Building Council. It is a voluntary and private certification system. This certification system assesses eight categories or topic areas [4]: 1) Location and transportation; 2) sustainable sites; 3) water efficiency; 4) energy and atmosphere; 5) materials and resources; 6) indoor environmental quality; 7) innovation; and 8) regional priority. The method used is based on the scoring of distinct categories, with the results of each category adding up to obtain an overall score. LEED certification is available at four progressive levels in accordance with the following scale: LEED Certificate, from 40 to 49 points; LEED Silver, from 50 to 59 points; LEED Gold, from 60 to 79 points; and LEED Platinum with 80 points or more.
The BREEAM and LEED certifications have been developed across the world, with specific stamps existing in almost all countries and with adaptations based on geographic location and building type.
Other certifications appearing in this study are the Comprehensive Assessment System for Building Environmental Efficiency (CASBEE), National Australian Built Environment Rating System (NABERS), Minergie, and GreenMark.
The CASBEE certification was developed in Japan by the Ministry of Land, Infrastructure, Transport, and Tourism, managed by the Japan Green Building Council (JaGBC) and the Japan Sustainable Consortium (JSBC). The CASBEE certification is available at five progressive levels in accordance with the following scale or scoring: Class C (low score) represented by one star; Class B, represented by two stars; Class B+ represented by three stars; Class A with four stars; and Class S (excellent) with five stars.
In Australia, two certification types are used: Green Star and NABERS. Green Star is a voluntary sustainable classification system that was developed by the Green Building Council of Australia. NABERS is promoted by the Australian government and is used to measure the sustainability of commercial buildings and offices.
Minergie is a voluntary certification created in Switzerland in 1994. It is a sustainable classification system used for new and restored buildings with low consumption. In 2001, a stricter Minergie-P classification was established for passive housing. Green Mark is a voluntary certificate that was created in Singapore in January of 2005 to promote sustainable buildings and to create environmental awareness, managed by the Building and Construction Authority (BCA).
In Europe, Directive 2002/91/CE [5], recast as Directive 2010/31/UE [6] implemented a mandatory certification system called the "ABCDEFG qualification", classifying buildings based on their energy efficiency. It establishes a scale of values that ranges from the letter "A" (best energy qualification) to the letter "G" (worst energy qualification). The assigned letter is based on the LEED certification was developed in 1993 in the U.S., by the Green Building Council. It is a voluntary and private certification system. This certification system assesses eight categories or topic areas [4]: (1) Location and transportation; (2) sustainable sites; (3) water efficiency; (4) energy and atmosphere; (5) materials and resources; (6) indoor environmental quality; (7) innovation; and (8) regional priority. The method used is based on the scoring of distinct categories, with the results of each category adding up to obtain an overall score. LEED certification is available at four progressive levels in accordance with the following scale: LEED Certificate, from 40 to 49 points; LEED Silver, from 50 to 59 points; LEED Gold, from 60 to 79 points; and LEED Platinum with 80 points or more.
The BREEAM and LEED certifications have been developed across the world, with specific stamps existing in almost all countries and with adaptations based on geographic location and building type.
Other certifications appearing in this study are the Comprehensive Assessment System for Building Environmental Efficiency (CASBEE), National Australian Built Environment Rating System (NABERS), Minergie, and GreenMark.
The CASBEE certification was developed in Japan by the Ministry of Land, Infrastructure, Transport, and Tourism, managed by the Japan Green Building Council (JaGBC) and the Japan Sustainable Consortium (JSBC). The CASBEE certification is available at five progressive levels in accordance with the following scale or scoring: Class C (low score) represented by one star; Class B, represented by two stars; Class B+ represented by three stars; Class A with four stars; and Class S (excellent) with five stars.
In Australia, two certification types are used: Green Star and NABERS. Green Star is a voluntary sustainable classification system that was developed by the Green Building Council of Australia. NABERS is promoted by the Australian government and is used to measure the sustainability of commercial buildings and offices.
Minergie is a voluntary certification created in Switzerland in 1994. It is a sustainable classification system used for new and restored buildings with low consumption. In 2001, a stricter Minergie-P classification was established for passive housing. Green Mark is a voluntary certificate that was created in Singapore in January of 2005 to promote sustainable buildings and to create environmental awareness, managed by the Building and Construction Authority (BCA).
In Europe, Directive 2002/91/CE [5], recast as Directive 2010/31/UE [6] implemented a mandatory certification system called the "ABCDEFG qualification", classifying buildings based on their energy efficiency. It establishes a scale of values that ranges from the letter "A" (best energy qualification) to the letter "G" (worst energy qualification). The assigned letter is based on the quantity of energy consumed (kW/year·m 2 ) and/or the CO 2 emitted by said building during its use. These certificates should be prepared by a competent technician, with the assistance of computerized tools created by relevant bodies of the European governments in order to qualify the energy efficiency of the buildings. Furthermore, upon creation of the certificates, they should be registered in an official public and informative institution.
In Europe, in addition to the mandatory certification, other voluntary standards exist (Figure 2). One example of this is the Passivhaus seal which was created in 1988 in Germany to reduce energy consumption. With this objective, the general directives are based on the creation of buildings with great thermal isolation, control of infiltrations, and good interior air quality, while also taking advantage of solar energy to improve the air conditioning [7].
Sustainability 2019, 11, x FOR PEER REVIEW 4 of 55 quantity of energy consumed (kW/year·m²) and/or the CO2 emitted by said building during its use. These certificates should be prepared by a competent technician, with the assistance of computerized tools created by relevant bodies of the European governments in order to qualify the energy efficiency of the buildings. Furthermore, upon creation of the certificates, they should be registered in an official public and informative institution. In Europe, in addition to the mandatory certification, other voluntary standards exist (Figure 2). One example of this is the Passivhaus seal which was created in 1988 in Germany to reduce energy consumption. With this objective, the general directives are based on the creation of buildings with great thermal isolation, control of infiltrations, and good interior air quality, while also taking advantage of solar energy to improve the air conditioning [7].

Background Information
Numerous studies have found that housing with energy qualifications have a positive premium in the sales and rental price. Other studies, however, have obtained negative or contrary results. For example, Yang [8] obtained a premium of 16% on new housing having a LEED qualification in Portland (United States). On the other hand, Yoshida and Sugiura [9] found a negative premium of 10.80% in housing with a Green Building qualification in Tokyo (Japan).
The purpose of this research is to estimate a representative value (effect size) based on prior studies. To do so, meta-analysis was used, in order to summarize the evidence accumulated in the study. This type of reviews began with Smith and Glass [10], but it was Hedges and Olkin [11] who proposed a methodology. Currently, the meta-analysis is a methodology used in all disciplines in which the study and analysis of the methodology has proliferated [12][13][14][15].
As Hunter and Schmidt [16] indicated, the meta-analysis is intended to integrate the findings from diverse studies so as to detect relationships existing between the same, generating a basis for theory development. Therefore, a meta-analysis in any science is the production of cumulative knowledge, used in all disciplines. Schmidt [17] indicates that meta-analyses may in fact offer more contributions to scientific analysis than primary research studies. According to Eden [18], empirical research increases in value when scientific generalizations may be made based on meta-analyses.
At the date of creation of this study, four documents were found that conduct meta-analyses on the economic price premium of buildings with an energy qualification. The first is a report created by Ankamah-Yeboah and Rehdanz [19] in which a systematic review is conducted, along with a metaregression with 30 documents (or studies) that include 205 registers (specification of a regression

Background Information
Numerous studies have found that housing with energy qualifications have a positive premium in the sales and rental price. Other studies, however, have obtained negative or contrary results. For example, Yang [8] obtained a premium of 16% on new housing having a LEED qualification in Portland (United States). On the other hand, Yoshida and Sugiura [9] found a negative premium of 10.80% in housing with a Green Building qualification in Tokyo (Japan).
The purpose of this research is to estimate a representative value (effect size) based on prior studies. To do so, meta-analysis was used, in order to summarize the evidence accumulated in the study. This type of reviews began with Smith and Glass [10], but it was Hedges and Olkin [11] who proposed a methodology. Currently, the meta-analysis is a methodology used in all disciplines in which the study and analysis of the methodology has proliferated [12][13][14][15].
As Hunter and Schmidt [16] indicated, the meta-analysis is intended to integrate the findings from diverse studies so as to detect relationships existing between the same, generating a basis for theory development. Therefore, a meta-analysis in any science is the production of cumulative knowledge, used in all disciplines. Schmidt [17] indicates that meta-analyses may in fact offer more contributions to scientific analysis than primary research studies. According to Eden [18], empirical research increases in value when scientific generalizations may be made based on meta-analyses.
At the date of creation of this study, four documents were found that conduct meta-analyses on the economic price premium of buildings with an energy qualification. The first is a report created by Ankamah-Yeboah and Rehdanz [19] in which a systematic review is conducted, along with a meta-regression with 30 documents (or studies) that include 205 registers (specification of a regression model), to determine the economic price premium of residential and office buildings that are for sale or for rent. They found that buildings with some type of qualification have an average premium of 7.6%. In their results, the authors indicate that energy efficiency of residential buildings is more highly valued in the case of sales markets and for those having voluntary labeling. The opposite is found in office buildings, where the energy qualification of the building is more highly valued in the rental market, as is the seniority of the qualification system. As for the premiums based on geographic location, higher premiums were obtained in Europe as compared to the U.S.
The second document is a report created by Brown and Watkins [20] in which a systematic review was conducted with meta-analysis and meta-regression, based on 17 studies and 20 registers. The results reveal that the housing with an energy qualification has a mean weighed premium of 4.3%. The authors indicate that, given the low number of observations, it is not possible to affirm that there are significant differences based on building location and qualification type.
The third document, Kim et al. [21], conducted a systematic review and a meta-regression of the economic premiums in office rental buildings. They analyzed a selection of nine publications that included 34 registers, finding a significant premium in the rental prices of 14.66%. The authors indicated that other characteristics with the greatest influence on the selection of this type of buildings are location, building characteristics, and contract type.
The last document, Fizaine et al. [22], conducted a systematic review with meta-analysis and meta-regression, to determine the economic premium in the sales price of the housing, using 54 documents which include 79 registers. The authors found that the economic premium varied between 3.5% and 4.5%, once correcting for publication bias. They attributed the dispersion of the results to: 1) Study location (North America, Asia, or Europe); 2) publication type; and 3) whether or not, in the hedonic model, localization variables were included. It should be mentioned that the authors found that, in many of the analyzed documents, standard error values were missing (or t test values or their statistical significance p), values that are necessary to conduct a meta-analysis. Furthermore, in some studies, the estimation of the qualification was carried out using distinct references, hindering comparison between them.

Materials and Methods
For the following steps, the criteria from the PRISMA [23,24] declaration were considered, which include: (1) Identification of the studies and information sources, in addition to the strategy of searching for documents with the dates of coverage and document identification (Section 3.1 of this document); (2) eligibility requirements, specifying the inclusion and exclusion criteria from the documents (Section 3.2); (3) baseline data, which describes the collected variables (Section 3.3 and 3.7); (4) data integrity through an assessment of information quality (Section 3.4); (5) document protocol and registry (Table 1); (6) description of selection bias (Section 4.1.2); (7) results specification, effect size according to the method used (Section 4.1.4); and (8) description of additional analysis methods (Section 4.1.5).
The five first steps are summarized in a Flow Diagram (Section 3.5).

Search and Selection Criteria
Document selection was conducted in pairs, from January 2018 until late April 2019, via: (1) Consultations of distinct databases (Elsevier ScienceDirect Complete, Springer, LexisNexis Academic, JSTOR, ProQuest Research, Munich Personal RePEc Archive, and Google Scholar); (2) by authors specializing in the "Green Premium" area; and (3) consultations of bibliographic references of the reviewed works. The following key words were used: Energy performance certificate, building energy efficiency rating, valuing building energy labels, building value and energy efficiency, energy efficiency

Selection Criteria
In order to compare and classify the results obtained from the analyzed documents, certain selection criteria were established so as to obtain a homogenous and comparable database, thus permitting reasonable generalization. The following criteria are used: (1) The document analyzes the price premium that is produced based on the existence of the energy qualification; (2) the calculation of the premium was conducted with a hedonic price model (HPM) using a semi-logarithmic functional form; (3) there is an analysis of the impact in residential buildings; and (4) a sales market is considered. Studies using neuronal network models, multi-level analysis, etc., were discarded, as were studies of residential rental markets and the entire commercial or office building market.
Following a reading of the summary of the 96 initially selected documents, it was found that 30 of these examined the effect of the premium on a rental or sales price for commercial or office buildings; therefore, they were discarded since they do not comply with selection criteria number 3 described above. As for the 66 remaining documents, all of them analyzed the effect of the premium on the sales or rental price of the residential buildings. After reading the documents, it was found that in each study one or more registries existed to determine the price premium (a register corresponds to a specification of a regression model). The existence of more than one registry depended on the following: (1) The price premium was analyzed based on the commercialization generating one model for sales and another for rental (for example, [25][26][27]); (2) the price premium of data sets from distinct years was analyzed, thereby generating a registry for each year [28][29][30][31]; (3) the models examined the price premium in distinct cities [26,32,33]; (4) the price premium of the housing was analyzed based on construction type (single or multi-family) [34][35][36]; (5) within the study, distinct types of EPC were analyzed [32,37,38]; (6) different qualification groups were analyzed [31,39]; and (7) the price premium was analyzed by comparing whether or not it had certification [40][41][42][43][44], and furthermore, the premium generated upon changing from one value to another within the EPC value scale was analyzed [25,36,45].
Therefore, of the 66 documents consulted, 213 distinct registers were generated ( Figure 3 and Table 1), including studies on buildings that were both for sale and for rent. In a subsequent phase, the registers related to rental housing were discarded (see Section 3.5). In order to compare and classify the results obtained from the analyzed documents, certain selection criteria were established so as to obtain a homogenous and comparable database, thus permitting reasonable generalization. The following criteria are used: 1) The document analyzes the price premium that is produced based on the existence of the energy qualification; 2) the calculation of the premium was conducted with a hedonic price model (HPM) using a semi-logarithmic functional form; 3) there is an analysis of the impact in residential buildings; and 4) a sales market is considered. Studies using neuronal network models, multi-level analysis, etc., were discarded, as were studies of residential rental markets and the entire commercial or office building market.
Following a reading of the summary of the 96 initially selected documents, it was found that 30 of these examined the effect of the premium on a rental or sales price for commercial or office buildings; therefore, they were discarded since they do not comply with selection criteria number 3 described above. As for the 66 remaining documents, all of them analyzed the effect of the premium on the sales or rental price of the residential buildings. After reading the documents, it was found that in each study one or more registries existed to determine the price premium (a register corresponds to a specification of a regression model). The existence of more than one registry depended on the following: 1) The price premium was analyzed based on the commercialization generating one model for sales and another for rental (for example, [25][26][27]); 2) the price premium of data sets from distinct years was analyzed, thereby generating a registry for each year [28][29][30][31]; 3) the models examined the price premium in distinct cities [26,32,33]; 4) the price premium of the housing was analyzed based on construction type (single or multi-family) [34][35][36]; 5) within the study, distinct types of EPC were analyzed [32,37,38]; 6) different qualification groups were analyzed [31,39]; and 7) the price premium was analyzed by comparing whether or not it had certification [40][41][42][43][44], and furthermore, the premium generated upon changing from one value to another within the EPC value scale was analyzed [25,36,45].
Therefore, of the 66 documents consulted, 213 distinct registers were generated ( Figure 3 and Table 1), including studies on buildings that were both for sale and for rent. In a subsequent phase, the registers related to rental housing were discarded (see Section 3.5).

Measure of Effect
The price premium in the sale of housing with an EPC is measured with the non-standardized regression coefficient β and the squared standardized error, as included in each of the registers. All

Measure of Effect
The price premium in the sale of housing with an EPC is measured with the non-standardized regression coefficient β and the squared standardized error, as included in each of the registers. All of the registers included the β coefficient, but certain types of data were not included in the document, such as: The standard error, the sample size, etc. When it was possible to contact the study authors, unavailable data were requested, and on other occasions, they were calculated, as in the case of standard error, based on the β coefficient and sample size [46], using Equation (1).
where β is the non-standardized regression coefficient and N is the sample size.

Assessment of the Quality of the Information Available in the Studies
According to Martín Vallejo [47], the documents collected for a meta-analysis, due to their different qualities and origins, may present contradictory results. Therefore, the quality may be assessed by items referring to: The study, the statistical data, or the results presentation. In this document, the quality is assessed in accordance with the statistical analyses, such as effect size and statistical power size of the models provided in each study. In order to obtain the effect size and the statistical power, the GPower program (version 3.1) was used [48]. It was found that the statistical power for all of the documents is near or equal to 1. Therefore, the quality of the studies was not used in the assessment. However, Cohen's f 2 ranged between 0.08 and 11.50, so it was used as a quality criteria [21].
where R 2 fin equals the adjusted R 2 , and when this is not available, the R 2 is used.
The following scoring criteria were used: 1.
If the document offers information on: a. The standard error (SE), it is scored with a 10; b.
The Student's t test, it is scored with a 10; c.
The sample size and these values are between: 1000-10,000; 10,000-100,000, or are greater than 100,000, it is scored with a 5.0, 7.5 or 10, respectively; if the study does not report on the sample size or if it is less than 1000, it is scored with a 0; and d.
The coefficient of determination, if reporting the R 2 adj, it is scored with a 10; if the R 2 is provided, it is scored with a 5;

2.
If the effect size (f 2 ) is greater than 0.35, 0.50, or 0.8, it is scored with a 5, 7.5, or 10, respectively, if not, or if it is lower than 0.35, it is scored with a 0.
The score resulting from the studies may be checked in the "Rating" column of Table 1. Qualification

Data Classification
Of these 213 registers, those related to rentals were discarded, leaving 173 registers that examine the effect of the sales price premium for the housing with an EPC, which are those complying with selection criteria 4) indicated in Section 3.2. The registers are classified based on EPC type: (1) ABCDEFG qualification (115 registers) and (2) other qualification (such as Energy Star, LEED, CASBEE, or Green Building, among others) (58 registers).
Based on the classification conducted, two distinct analyses are proposed ( Figure 4): -Analysis-1 (A1) which analyzes the impact on the prices of housing with an EPC as compared to housing without qualification, for both the ABCDEFG qualification (19 registers), as well as other qualifications (43 registers); -Analysis-2 (A2) analyzing the impact on the prices of housing with the ABCDEFG qualification (91 registers).

Geographical Framework
The database consists of 66 documents distributed geographically across the globe. As seen in Figure 5, there is a greater concentration in North America and Europe, as compared to the other continents (20 in North America, 31 in Europe, 13 in Asia, and 2 in Oceania).

Geographical Framework
The database consists of 66 documents distributed geographically across the globe. As seen in Figure 5, there is a greater concentration in North America and Europe, as compared to the other continents (20 in North America, 31 in Europe, 13 in Asia, and 2 in Oceania). Within the European continent, the 31 documents that examined the residential market were distributed as follows: 1 in Norway, 2 in Sweden, 1 in Denmark, 3 in Ireland, 3 in the United Kingdom, 2 in the Netherlands, 3 in Belgium, 3 in Germany, 4 in Switzerland, 2 in Italy, 2 in France, 1 in Portugal, and 4 in Spain.
In the American continent, the 20 documents examining the residential market are from North America and are distributed as follows: 17 in the United States (1 in Alaska, 2 in California, 1 in Colorado, 1 in Florida, 3 in Georgia, 3 in Oregon, 5 in Texas, and 1 that considers various studies) and 3 in Canada.
In Asia, they are distributed as follows: 4 in China, 6 in Japan, and 3 in the Republic of Singapore. In Oceania (Australia), there are 2 documents in Canberra.

Analysis-1
In the first step, the documents comparing "labeled/non-labeled" with any qualification type were selected, in accordance with the classification of the data made in Section 3.5. The initial sample of 62 registers was used, made up of 19 registers having the ABCDEFG qualification and 43 registers with other qualification types ( Figure 4). Then, the atypical uni-variate and multi-variate cases were eliminated (coded as A1*), using the following steps: 1) Those registers whose premium was more than three standard deviations (SD) apart were eliminated, discarding registers number 87 and 111; and 2) using the remaining registers, the regression model was calibrated and the Mahalanobis distance (DM) was calculated, eliminating those registers whose statistical significance was less than 0.001, as indicated by Hair et al. [92], excluding registers number 2, 3, 23, and 143. Thus, we obtained a final sample of 56 registers.
In Table 2, the 33 variables collected for this study were related, ordering them in seven categories. The unit with which each variable was measured was also indicated, along with a brief description of the same and whether or not it had been used in the final regression model. Within the European continent, the 31 documents that examined the residential market were distributed as follows: 1 in Norway, 2 in Sweden, 1 in Denmark, 3 in Ireland, 3 in the United Kingdom, 2 in the Netherlands, 3 in Belgium, 3 in Germany, 4 in Switzerland, 2 in Italy, 2 in France, 1 in Portugal, and 4 in Spain.
In the American continent, the 20 documents examining the residential market are from North America and are distributed as follows: 17 in the United States (1 in Alaska, 2 in California, 1 in Colorado, 1 in Florida, 3 in Georgia, 3 in Oregon, 5 in Texas, and 1 that considers various studies) and 3 in Canada.
In Asia, they are distributed as follows: 4 in China, 6 in Japan, and 3 in the Republic of Singapore. In Oceania (Australia), there are 2 documents in Canberra.

Analysis-1
In the first step, the documents comparing "labeled/non-labeled" with any qualification type were selected, in accordance with the classification of the data made in Section 3.5. The initial sample of 62 registers was used, made up of 19 registers having the ABCDEFG qualification and 43 registers with other qualification types ( Figure 4). Then, the atypical uni-variate and multi-variate cases were eliminated (coded as A1*), using the following steps: (1) Those registers whose premium was more than three standard deviations (SD) apart were eliminated, discarding registers number 87 and 111; and (2) using the remaining registers, the regression model was calibrated and the Mahalanobis distance (DM) was calculated, eliminating those registers whose statistical significance was less than 0.001, as indicated by Hair et al. [92], excluding registers number 2, 3, 23, and 143. Thus, we obtained a final sample of 56 registers.
In Table 2, the 33 variables collected for this study were related, ordering them in seven categories. The unit with which each variable was measured was also indicated, along with a brief description of the same and whether or not it had been used in the final regression model.
Category I consists of the dependent variable, Premium_EPC, which contains the value of the variation in sales price of housing with an energy label, made up of the non-standardized β coefficients of the analyzed registers, all having a semi-logarithmic functional form. The second variable is the variance of the VAR estimation, calculated based on the squared standard error and used to attempt to resolve any potential publication bias, both in the meta-analysis or the meta-regression, according to [93]. Category II consists of eight variables, used to measure whether or not there is selection bias for the analyzed documents, based on the date of publication of the document, the period in which the data were collected, number of authors, type of document, and quality index of its indexing. Category III consists of three dummy variables used to geographically locate the study data (America, Asia, and Europe).
Category IV consists of three dummy variables used to define the constructive typology. The registers are differentiated between one another if the sample consists of single family, multifamily, or multiple housing (when housing of both types are used in the register).
Category V consists of three variables used to define the data of the energy label used in the study, which are type of energy qualification used, date of onset of the label, and mandatory nature of the same.
Category VI consists of seven dummy variables used to identify whether or not certain predictor variables were used in the statistical model: The characteristics of the property, characteristics of the building, characteristics of the neighborhood, characteristics of the location, characteristics of the area, and characteristics of the market and of the financing.
Category VII consists of six variables that are used to define the statistical data of each analyzed register, such as: The origin of the prices (if coming from a real estate portal or not), whether or not the dependent variable is introduced in the model as a price/surface area, sample size, coefficient of determination of the model, and statistical power.
All of the dummy variables are coded with a value of 1 when they have said characteristics and a 0 when they do not. The descriptive statistics of all variables are shown in Table 3. Figure 6a shows the graphics of boxes created for each of the three continents based on the energy label. As seen, when other qualification types are used (LEED, BREEAM, etc.), atypical values appear in the Asian continent, with the American continent having the greatest dispersion and asymmetry, as compared to the average.

Analysis-2
For this analysis, the registers with the ABCDEFG qualification were selected and the initial sample of 115 registers (Figure 4) was used. They were grouped together based on what the label analyzes: Labeled/non-labeled; grouped letters, ungrouped labels, and other non-comparable labels. Cases of labeled/non-labeled (19 registers) and non-comparable ones (5 registers) were discarded from this Analysis 2.
Next, abnormal cases were eliminated (A2* in Table 1), identifying registers from the same study that could be correlated: 1) Studies using different registers for the same type (isolated houses, semidetached houses, etc.) in which case the register that includes the complete sample of cases of this type were selected, thereby eliminating registers 27-32, 75-79, and 99-104; and 2) studies that provide the premium value when offered and the premium once the property is sold; the latter is included, eliminating register 82. In this way, a final sample of 73 registers is obtained for Analysis 2.  In Figure 6b, the mean of the percentage of the premium in the price for each of the continents is shown, with the percentage of the premium being lower with the ABCDEFG qualification. It is also found that in Europe, there is a mandatory label with the ABCDEFG qualification scale, while in America and Asia, there is greater diversity of non-mandatory labels.

Analysis-2
For this analysis, the registers with the ABCDEFG qualification were selected and the initial sample of 115 registers (Figure 4) was used. They were grouped together based on what the label analyzes: Labeled/non-labeled; grouped letters, ungrouped labels, and other non-comparable labels. Cases of labeled/non-labeled (19 registers) and non-comparable ones (5 registers) were discarded from this Analysis 2.
Next, abnormal cases were eliminated (A2* in Table 1), identifying registers from the same study that could be correlated: (1) Studies using different registers for the same type (isolated houses, semi-detached houses, etc.) in which case the register that includes the complete sample of cases of this type were selected, thereby eliminating registers 27-32, 75-79, and 99-104; and (2) studies that provide the premium value when offered and the premium once the property is sold; the latter is included, eliminating register 82. In this way, a final sample of 73 registers is obtained for Analysis 2.

Methodology
This document attempts to raise theoretical awareness regarding sales price premiums of residential housing containing an EPC, based on a systematic review using meta-analysis and meta-regression, with a descriptive, comparative, correlational, and exploratory design.
In this study, the work line of other authors was followed, analyzing the influence of the EPC on the price, using two distinct approaches: (1) Analysis-1 (A1): Quantifies the premium for the price of the housing having EPC as compared to those without it; (2) Analysis-2 (A2): Of the housing with an EPC, it quantifies the premium resulting from changing from one qualification to another within the analyzed scale. In this second way, it is only possible to analyze the ABCDEFG qualification, having observed that each author proposes distinct scenarios, considering different reference bases to measure the impact of the EPC on the price (see Section 3.8.2).

Analysis-1
Below, the steps followed to estimate the premium in the sales price of the housing with an EPC, as compared to housing without it, are described. First, a descriptive analysis is conducted and then, a study of publication heterogeneity and bias, a sensitivity analysis, and, finally, a meta-analysis and a meta-regression.
Heterogeneity in a meta-analysis can lead to distorted results. This heterogeneity may be due to: (1) Selection and publication biases; (2) a poor selection of the measurement of the effect; and (3) the different study results.
To avoid the selection and publication biases, documents published in distinct languages were selected, not only those in English. Moreover, in the meta-analysis, documents published in journals as well as other documents from the so-called grey literature were included (reports, congresses, and theses) as indicated by Begg [94]. Furthermore, the search for documentation was conducted in distinct databases and not only through the use of bibliographic references. To explore the existence of selection bias, a visual assessment was carried out with the funnel plot.
To avoid heterogeneity based on the selection of the type of measure to quantify the effect size, the selected documents are homogenous and comparable, since they all analyze the premium in the price of residential buildings that are commercialized for sale through HPM with semi-logarithmic estimates. A rigorous selection process was followed, eliminating extreme and atypical uni-variate and multi-variate cases, discarding the registers that were greater than three standard deviations and those whose statistical significance of the Mahalanobis distance was less than 0.001.
To evaluate and quantify the heterogeneity between the studies included in the analysis, three meta-analyses were carried out (by publication type, by data period, and by continent), as well as a meta-description with a meta-regression with randomized effects, comparing the distinct models and the X 2 , Tau 2 , and I 2 statistics. The statistical heterogeneity exists when the value of p is less than 0.05 for the X 2 statistic or the I 2 test is greater than 50%. where: Q is the test of the X 2 to assess the heterogeneity of the studies included in a meta-analysis, where the magnitude of the effect of each individual study is compared with the combined estimator; k − 1 are the degrees of freedom, where k is the number of studies. The meta-regression analyses consider an initial model of fixed effects and a second model of random effects, as suggested in [12,[95][96][97][98]. The fixed effects model assumes that there is no heterogeneity between the analyzed documents, such that all of these estimate the same effect and the differences are only due to chance [13,99].θ where: θ i is the dependent variable or the measurement of the effect (Premium_EPC), obtained from the results of the distinct registers analyzed, from i = 1, . . . , k; δ i is the error committed in the observation i upon approaching θ; θ, is the fixed overall effect, which may be estimated with a weighted mean of the individual effects of each study: where: w i are the weights or weighting carried out by the inverse variance method (w i = 1 σ 2 i is the variance of each estimator of the meta-sample.
The random effects model assumes that there is heterogeneity in the analyzed documents, such that, in addition to the overall effect and the estimation error, the random effect generated from each study is considered [13,99]. The random effects model regards the studies as a sample of a larger universe of studies and can be used to infer what would likely happen if a new study were performed. where: θ i is the dependent variable or measurement of the effect (Premium_EPC), obtained from the results of the distinct registers analyzed, from i = 1, . . . , k; θ i is the effect to estimate in the ith study of the meta-sample; δ i is the error committed in the observation i upon approaching θ; θ is the fixed overall effect that can be estimated as a weighted mean of the individual effects of each study: where: w i is the weight associated with each estimator of the sample (w i = 1 σ 2 i +τ 2 ); σ 2 i is the variance of each estimator of the meta-sample; τ 2 is the variance between studies.
Different methodologies may be used to calculate the overall effect, based on the dependent variable and the characteristics to be analyzed [100]. There are various estimators to calculate the variance between studies (τ 2 ) such as the DerSimonian and Laird [101] (DL), Hunter and Schmidt (HS), Hedges and Olkin [11] (HO), maximum likelihood (ML), and restricted maximum likelihood (REML), among others. According to Viechtbauer [102], when it comes to selecting one of these methods, the objective is to optimize: (1) The bias (difference between the estimated value and the actual value); (2) efficiency (should not be affected by sampling fluctuation); and (3) the mean square error (MSE). Veroniki et al. [103] conducted another study in which a larger number of estimators was analyzed and they concluded that the selection of the most appropriate estimator depends on: (1) If a zero value of the variance is considered possible; (2) properties of the estimators for the bias and efficiency, which depends on the number of studies included and the real variance; and (3) ease of application.
The meta-regression is performed with the six estimators shown in Table 4. In addition, the following hypotheses are verified: (1) Normality of the distribution of the dependent variable with the Kolmogorov-Smirnov test, a frequency diagram, and a graph of normality residuals; and (2) homoscedasticity, using the Breusch-Pagan test. Table 4. Summary of the most noteworthy characteristics when selecting an estimator for a random model.

Method of moments estimators DerSimonian and Laird (DL)
It is acceptable when the real levels of the variance between studies is small or almost zero, but when the variance is large, the DL estimator may produce estimates having significant negative bias.

Hedges and Olkin (HO)
The HO functions well in the presence of substantial variation between studies, especially when the number of studies is large (that is, k ≥ 30), but produces a large MSE. In general, it produces estimates that are slightly greater than those produced in the DL and REML methods.

Hunter and Schmidt (HS)
If the sample is negatively or positively biased, it leads to an under-estimation or over-estimation of the real variation between studies. When the sample size is small, an under-estimation may be produced in the heterogeneity.

Maximum likelihood (ML)
This is an asymptotically efficient method that requires an iterative solution; thus, it depends on the selection of the maximization method. In addition, it has the smallest MSE in comparison with the REML and HO methods, but the greatest quantity of negative bias between them.

Restricted maximum likelihood (REML)
It may be used to correct the negative bias associated with the ML method. It is not adequate when there are few observations. It has less bias with dichotomous data than the ML, but has a greater MSE. For continuous data, REML is the preferred approach when large studies are included in the meta-analysis.

Bayes estimators Bayes estimators (Bayes)
It is recommended when there are samples with less than 5 observations, since less bias is generated as compared to other stimulators (DL, HO, or REML).
The calculations were made with a 95% confidence level. For these analyses, OpenMEE [104] software was used, as well as the R "metafor" package (version 2.0) [105] and the IBM SPSS Statistic (version 21) and the SPSS macros by Ahmad Daryanto for the Breusch-Pagan and Koenker test (July 2018) [106].

Analysis-2
For the registers of the ABCDEFG qualifications, the steps followed to quantify the premium resulting when changing from one level to another in the qualification scale were carried out via descriptive analysis of the registers, based on the reference base used by the authors for each document: (1) Groupings of letters (for example: DEFG compared to ABC or EFG compared to ABCD); (2) independent letters, using one letter as a reference (D, F, or G) and analyzing the price premium in comparison to other individual or grouped letters (AB, EFG, or FG); and (3) housing without qualification (NT) and analyzing the price premium in each of the qualification letters individually or grouped (AB and CDEFG).

Normality and Heteroscedasticity
The normality of the distribution of the dependent variable (Premium_EPC) has been verified with the Kolmogorov-Smirnov test, which was found to be not statistically significant (D = 0.086, p = 0.200, n = 56), suggesting that the sample follows a normal distribution, as represented in the histogram ( Figure 7a) and a normal probability plot of the standardized residual (Figure 7b). To evaluate the existence of heteroscedasticity, the Breusch-Pagan test was conducted (BP = 19.77, df = 15, p = 0.181), with the results suggesting that the null hypothesis of heteroscedasticity should not be rejected; thus, heteroscedasticity was not found.

Heterogeneity and Publication Bias
Cochran's Q test for the model of fixed effects was found to be statistically significant (Q = 17821.36, df = 55, p < 0.0001), confirming the existence of heterogeneity for the sample. In order to determine the origin of the heterogeneity, a funnel plot and Baujat plot were used.
To evaluate publication bias, a visual assessment was conducted with the contour-enhanced funnel plot [107], introducing the studies grouped together by continent and shadowing the regions based on their significance level (Figure 8a). It was observed that a major asymmetry exists in the documents, concentrated in the upper left-hand area of the graph (mainly European studies). This is the area where the results have a greater precision, with smaller confidence intervals and greater statistical significance. A certain accumulation was seen in the observations corresponding to America, which have a greater dispersion of the variance and are distanced from the observations of Europe and Asia. This suggests that the heterogeneity is not due to selection bias, but rather, to the localization factor.
The X axis of Baujat et al. [108] (Figure 8b) reveals the contribution of each study to the overall heterogeneity of the sample (through the Cochran Q test), while the Y axis represents the influence of the study on the overall results. Tests with greater heterogeneity and a larger influence appear in the upper right-hand area of the graph (register 11). Those that contribute more to the heterogeneity are situated in the lower right-hand area of the graph (registers 9, 47, 55, 64, 96, and 105), and those situated in the upper left-hand region reveal a greater influence (registers: 58, 67, 85, 117, and 169). As a sensitivity analysis, upon eliminating these 12 registers, approximately 87.36% of the heterogeneity is removed (Q = 2251.97, df = 43, p < 0.0001), but it continues to be statistically significant,

Heterogeneity and Publication Bias
Cochran's Q test for the model of fixed effects was found to be statistically significant (Q = 17821.36, df = 55, p < 0.0001), confirming the existence of heterogeneity for the sample. In order to determine the origin of the heterogeneity, a funnel plot and Baujat plot were used.
To evaluate publication bias, a visual assessment was conducted with the contour-enhanced funnel plot [107], introducing the studies grouped together by continent and shadowing the regions based on their significance level (Figure 8a). It was observed that a major asymmetry exists in the documents, concentrated in the upper left-hand area of the graph (mainly European studies). This is the area where the results have a greater precision, with smaller confidence intervals and greater statistical significance. A certain accumulation was seen in the observations corresponding to America, which have a greater dispersion of the variance and are distanced from the observations of Europe and Asia. This suggests that the heterogeneity is not due to selection bias, but rather, to the localization factor.  Figure 9 offers a sensitivity analysis based on a forest plot in which the influence of each of the studies on the overall effect is examined. The graph represents the overall combined effect of the sales premium of the housing that has EPC, every time one of the studies is omitted. The results show that when some of the included studies are omitted, neither the direction nor the significance are changed, upon comparing these with the combined estimate from all of the studies (overall effect = 0.0420). There is also no evidence of a significant change in the heterogeneity index (I 2 ), whose values are between 99.6% and 99.8%; therefore, it may be said that none of the studies notably affects the overall estimated result, and therefore, the results may be considered robust [110] and [111].

Sensitivity Analysis
However, it may be useful to highlight the influence of two studies: Addae-Dapaah and Chieh [51] and Yoshida and Sugiura [41] which are found to fall outside of the 95% confidence interval of the overall effect of all of the studies (IC-95% = 0.0407, 0.0433), which, upon being eliminated, produce a reduction or increase in the estimate of the overall combined effect. The X axis of Baujat et al. [108] (Figure 8b) reveals the contribution of each study to the overall heterogeneity of the sample (through the Cochran Q test), while the Y axis represents the influence of the study on the overall results. Tests with greater heterogeneity and a larger influence appear in the upper right-hand area of the graph (register 11). Those that contribute more to the heterogeneity are situated in the lower right-hand area of the graph (registers 9, 47, 55, 64, 96, and 105), and those situated in the upper left-hand region reveal a greater influence (registers: 58, 67, 85, 117, and 169). As a sensitivity analysis, upon eliminating these 12 registers, approximately 87.36% of the heterogeneity is removed (Q = 2251.97, df = 43, p < 0.0001), but it continues to be statistically significant, suggesting that it would be necessary to continue eliminating registers. If repeating the process two more times, the remaining sample would have 21 observations, reducing the heterogeneity by approximately 99.19%, but not fully eliminating it (Q = 144.31, df = 20, p < 0.0001). These results suggest that even upon eliminating over half of the registers, the heterogeneity remains, and therefore, it is considered that this heterogeneity is not the result of publication bias, but rather, it is a result of the very data that are being analyzed [109]. Figure 9 offers a sensitivity analysis based on a forest plot in which the influence of each of the studies on the overall effect is examined. The graph represents the overall combined effect of the sales premium of the housing that has EPC, every time one of the studies is omitted. The results show that when some of the included studies are omitted, neither the direction nor the significance are changed, upon comparing these with the combined estimate from all of the studies (overall effect = 0.0420). There is also no evidence of a significant change in the heterogeneity index (I 2 ), whose values are between 99.6% and 99.8%; therefore, it may be said that none of the studies notably affects the overall estimated result, and therefore, the results may be considered robust [110] and [111].

Sensitivity Analysis
However, it may be useful to highlight the influence of two studies: Addae-Dapaah and Chieh [51] and Yoshida and Sugiura [41] which are found to fall outside of the 95% confidence interval of the overall effect of all of the studies (IC-95% = 0.0407, 0.0433), which, upon being eliminated, produce a reduction or increase in the estimate of the overall combined effect.

Meta-Analysis
Since the fixed effects and random effects methods are exclusive, one of these should be selected based on the heterogeneity (Q test) [97]. As shown in the previous section, heterogeneity exists. Therefore, three meta-analyses were conducted, forming sub-groups [112], based on the publication type ( Figure 10), based on the data period ( Figure 11) and based on the continent (Figure 12), in order to verify whether or not the heterogeneity of the registers is a result of these factors, estimating the

Meta-Analysis
Since the fixed effects and random effects methods are exclusive, one of these should be selected based on the heterogeneity (Q test) [97]. As shown in the previous section, heterogeneity exists. Therefore, three meta-analyses were conducted, forming sub-groups [112], based on the publication type ( Figure 10), based on the data period ( Figure 11) and based on the continent (Figure 12), in order to verify whether or not the heterogeneity of the registers is a result of these factors, estimating the effect for each sub-group with a random effects model through a forest plot. To create this model, distinct estimators may be used, but for the estimate of the confidence interval for the variance between studies, Veroniki et al. [103] consider that better results are obtained with a REML estimate, as compared to the DL estimate. Therefore, this was the one used in this study.

Author(s) and Year Continuos Random-Effect. Method: REML Effect size B [95% CI]
Overall (Q = 17756.92, df = 55, p = 0.000; I² = 99.8%), p = 0.000) Figure 11. Forest plot based on period of data collection and overall combined effect. In Figure 11, the studies are organized into three sub-groups according to the date of data collection: The first group, mainly compiled before 2008, shows an effect size of 3.90%; the second group, gathered from 2006-2010, shows an effect of 6.95%; and, finally, the third group of data mainly collected after 2008, shows an effect of 3.65%. The resulting analysis indicates similar effects in the first and third sub-groups, both remaining close to the value of the overall combined effect. By Various forest plots appear in the document. As clarifying notes for all of these, we should note: (1) To identify the studies with more than one register, in addition to indicating the «register» and the «author (year)», the «location (label)» was included; and (2) for interpretation purposes, " " indicates the study results and the size is proportional to the contribution of the register to the overall result. The horizontal lines, "-", correspond to the confidence intervals and reveal the precision of the studies and whether or not they are statistically significant (when they do not cross the black dotted line that corresponds to a null effect, zero). " between studies, Veroniki et al. [103] consider that better results are obtained with a REML estimate, as compared to the DL estimate. Therefore, this was the one used in this study.
Various forest plots appear in the document. As clarifying notes for all of these, we should note: 1) To identify the studies with more than one register, in addition to indicating the «register» and the «author (year)», the «location (label)» was included; and 2) for interpretation purposes, "" indicates the study results and the size is proportional to the contribution of the register to the overall result. The horizontal lines, "-", correspond to the confidence intervals and reveal the precision of the studies and whether or not they are statistically significant (when they do not cross the black dotted line that corresponds to a null effect, zero).
Result of the combined effect, by sub-groups (type of study, data period or continents) and " result of the overall combined effect, both are to be interpreted as the weighted average effect or "combined" effect size, obtained according to equations (6) and (7). "¦" Red dashed line that represents the value of the overall combined effect.
In Figure 10, it is found that a price premium is generated for the housing sales with an overall combined effect of 4.20%. However, when the publication sub-group considered is the Master's thesis, it is 4.09%. If reports are considered, it is 3.59% and if journal articles are looked at, it is 4.40%. These are all statistically significant values. In the thesis sub-group, it is observed that the four registers are distributed across both sides of the line that defines the overall combined effect. Thus, the sub-sample would be homogenous with regards to selection bias. On the other hand, the confidence intervals of the registers are quite broad, suggesting a greater dispersion in the premiums. There are 12 publications in the reports sub-group, while the registers have more concentrated confidence intervals as compared to in the journal articles. The journal articles sub-group has 40 registers, revealing more diverse confidence intervals than in the previous sub-groups.
It is found that the three publication types have superimposed results (yellow diamonds), so, apparently, there is no significant difference between the average effects of the sub-groups and the overall combined effect. Therefore, it cannot be affirmed that the heterogeneity of the registers is a result of the type of publication considered.
" Result of the combined effect, by sub-groups (type of study, data period or continents) and " between studies, Veroniki et al. [103] consider that better results are obtained with a REML estimate, as compared to the DL estimate. Therefore, this was the one used in this study.
Various forest plots appear in the document. As clarifying notes for all of these, we should note: 1) To identify the studies with more than one register, in addition to indicating the «register» and the «author (year)», the «location (label)» was included; and 2) for interpretation purposes, "" indicates the study results and the size is proportional to the contribution of the register to the overall result. The horizontal lines, "-", correspond to the confidence intervals and reveal the precision of the studies and whether or not they are statistically significant (when they do not cross the black dotted line that corresponds to a null effect, zero).
Result of the combined effect, by sub-groups (type of study, data period or continents) and " result of the overall combined effect, both are to be interpreted as the weighted average effect or "combined" effect size, obtained according to equations (6) and (7). "¦" Red dashed line that represents the value of the overall combined effect.
In Figure 10, it is found that a price premium is generated for the housing sales with an overall combined effect of 4.20%. However, when the publication sub-group considered is the Master's thesis, it is 4.09%. If reports are considered, it is 3.59% and if journal articles are looked at, it is 4.40%. These are all statistically significant values. In the thesis sub-group, it is observed that the four registers are distributed across both sides of the line that defines the overall combined effect. Thus, the sub-sample would be homogenous with regards to selection bias. On the other hand, the confidence intervals of the registers are quite broad, suggesting a greater dispersion in the premiums. There are 12 publications in the reports sub-group, while the registers have more concentrated confidence intervals as compared to in the journal articles. The journal articles sub-group has 40 registers, revealing more diverse confidence intervals than in the previous sub-groups.
It is found that the three publication types have superimposed results (yellow diamonds), so, apparently, there is no significant difference between the average effects of the sub-groups and the overall combined effect. Therefore, it cannot be affirmed that the heterogeneity of the registers is a result of the type of publication considered.
" result of the overall combined effect, both are to be interpreted as the weighted average effect or "combined" effect size, obtained according to Equations (6) and (7). "¦" Red dashed line that represents the value of the overall combined effect.
In Figure 10, it is found that a price premium is generated for the housing sales with an overall combined effect of 4.20%. However, when the publication sub-group considered is the Master's thesis, it is 4.09%. If reports are considered, it is 3.59% and if journal articles are looked at, it is 4.40%. These are all statistically significant values. In the thesis sub-group, it is observed that the four registers are distributed across both sides of the line that defines the overall combined effect. Thus, the sub-sample would be homogenous with regards to selection bias. On the other hand, the confidence intervals of the registers are quite broad, suggesting a greater dispersion in the premiums. There are 12 publications in the reports sub-group, while the registers have more concentrated confidence intervals as compared to in the journal articles. The journal articles sub-group has 40 registers, revealing more diverse confidence intervals than in the previous sub-groups.
It is found that the three publication types have superimposed results (yellow diamonds), so, apparently, there is no significant difference between the average effects of the sub-groups and the overall combined effect. Therefore, it cannot be affirmed that the heterogeneity of the registers is a result of the type of publication considered.
In Figure 11, the studies are organized into three sub-groups according to the date of data collection: The first group, mainly compiled before 2008, shows an effect size of 3.90%; the second group, gathered from 2006-2010, shows an effect of 6.95%; and, finally, the third group of data mainly collected after 2008, shows an effect of 3.65%. The resulting analysis indicates similar effects in the first and third sub-groups, both remaining close to the value of the overall combined effect. By comparison, the second sub-group corresponding to the crisis period shows a greater impact of the energy qualification on price increase than the other two sub-groups. Apart from the date time slot, this disparity is also justified by the predominant house typology. Most of the studies in the second sub-group are based on cases located in America, where single family dwellings are the predominant kind.
In Figure 12, the sub-groups were analyzed by continent, with an effect of 5.36% in America, 4.81% in Asia, and 2.32% in Europe. All of these values are statistically significant. In America, it is observed that the 27 registers are distributed across both sides of the line defining the overall combined effect. Therefore, the sample will be homogenous in this subset with regards to selection bias. On the other hand, the confidence intervals of the registers are, generally speaking, quite broad, suggesting a greater dispersion in the premiums. In Asia, the number of registers is 11, less than half of those in America. However, the registers have smaller confidence intervals and there is not as much dispersion in the premiums. In Europe, there are 18 registers of which only one has broad confidence intervals, and within this sample there is not much dispersion.
It is found that the results for America and Asia are relatively well aligned with one another (yellow diamonds), so apparently, there is no significant difference between them. On the other hand, when comparing these two continents with Europe, it may be seen that the premium is approximately half of that of America and Asia. The confidence intervals of America and Europe do not even overlap (width of the yellow diamonds). Thus, there is a significant difference between the average values. This difference may result in the heterogeneity that was observed in the previous section.
What factors cause these differences? First, in Europe, the energy label is mandatory, which is not the case in either America or Asia. Second, the energy label in Europe (the ABCDEFG qualification) does not precisely quantify like the other labels. Furthermore, the construction type may also affect the results. An attempt is made to resolve these issues by proposing a graphic analysis ( Figure 13) and a meta-regression (Section 4.1.5), which reveals the multi-collinearity existing between the continents, construction type, and energy labeling type. Figure 13. Distribution of data continent, construction type, number of registers (study identifier according to Table 1), and type of energy label. Representation made with the RAWGraphs web tool [113].

Meta-Regression
The object of this subsection is to determine whether or not the heterogeneity existing between the registers is related to the specific characteristics of these documents [112]. Of the variables described in Table 2, those having problems of multi-collinearity with other variables have been discarded, as is the case of the continent with construction type and energy labeling, or the coefficient    Figure 13. Distribution of data continent, construction type, number of registers (study identifier according to Table 1), and type of energy label. Representation made with the RAWGraphs web tool [113].

Meta-Regression
The object of this subsection is to determine whether or not the heterogeneity existing between the registers is related to the specific characteristics of these documents [112]. Of the variables described in Table 2, those having problems of multi-collinearity with other variables have been discarded, as is the case of the continent with construction type and energy labeling, or the coefficient of determination with the statistical power. Furthermore, those variables in which no variability existed in the classification were eliminated, as was the case for the housing characteristics (C_Dwelling) or when the number of observations is low (Sample_size). In all, 13 variables were discarded: SE, America, Asia, Europe, Single_Family, Multifamily, Multiple, Obligatory, C_Dwelling, R 2 fin, f 2 , and t-test.
Since the existence of heterogeneity has been found in the sample, the use of fixed effects for the meta-regression was discarded. A random effects model was created using distinct methods (Table 5): DL, HO, HS, ML, REML, and Bayes. It is observed that since the sample is positively biased, the HS method offers an over-estimation of the variance, resulting in almost all of the characteristics being significant at p < 0.001 and the determination coefficient being very high (94.75%). Of the other methods, the HO, REML, and Bayes offered similar results, with 17 of the 18 variables obtaining the same value of B and with very little variation in the coefficient of determination (2.42%). Therefore, either of these would be appropriate. For this document, the REML method was selected, in line with [102][103][104][105][106][107][108][109][110][111][112][113][114], given that, according to these authors, it is the method that offers the best results in terms of bias and efficiency as compared to the DL, ML, HS, and HO methods.   Table 4).
The results of the REML model reveal that, with respect to the publication characteristics (category II), the housing price premium is greater in documents having a more recent publication date (Date_publication = 0.005, not significant), in documents with a larger number of authors (Num_Autor = 0.010, not significant) or if the document is a journal article (Journal_Article = 0.023, not significant). On the other hand, the premium decreases when the data in the documents are prior to the 2008 crisis (Date_Before_Crisis = −0.014, not significant) and if the published document has some sort of JCR quality index (−0.019, not significant) and/or SJR (−0.037, not significant). If the data in the documents are post-crisis, the price remains stable with respect to the period during the crisis (Date_After_Crisis = 0.000, not significant).
When analyzing the characteristics of the energy label (category V), it is observed that the premium is lower when the ABCDEFG label is used (Qualif_ABCDEFG = −0.054, significant), as compared to the other label types (LEED, Energy Star, etc.). On the other hand, the premium increases when the start date of the qualification is more recent (Date_Label = 0.004, significant).
Within category VI of the model predictor variables, the premium decreases when characteristics defining the building are not used (C_Building = −0.000, not significant) such as elevator, swimming pool, garden or garage, etc.; the property location (C_Location = −0.023, not significant) or the financing (C_Financing = −0.048, not significant). On the other hand, it increases when using neighborhood characteristics (C_Neighborhood = 0.023, not significant) such as delinquency rate, neighborhood income, or percentage of elderly individuals; characteristics of the area (C_Zone = 0.022, not significant) such as density of construction or land use; and market characteristics (C_Market = 0.033, significant) such as commercialization type or time of sale.
Finally, the statistical data VII category is positively affected by the number of register observations (Sample_size = 0.000, not significant), but the premium decreases when the economic data is obtained from web pages (Data_web = −0.008, not significant) or if the sales price has been introduced in the model as a monetary unit divided by the surface area (Price_area = −0.018, not significant).

Analysis-2
In this second analysis, the housing having EPC with the ABCDEFG qualification are examined, and the premium resulting from changing from one value to another within the analyzed qualification scale was quantified. This type of analysis has an advantage over the previous type, since it permits the identification of whether the housing with high qualifications have higher price premiums than those with low qualifications. One problem found in this research is that the reference base used by the distinct authors varies, and therefore, there are few cases that can be compared, to thereby quantify these values.
Of the registers included in Table 1, the sales price premiums of the EU's EPC having the "ABCDEFG qualification" were analyzed for two reasons: (1) This qualification is mandatory in EU member countries; (2) it permits the quantification of the premium that passes from one value to another within the qualification scale; and (3) there is a large number of registers with this qualification (73 registers).
The registers analyzed are classified based on the reference base and these are: (1) Letter groupings: DEFG compared to ABC or EFG compared to ABCD ( Figure 14); and (2) independent letters: Using as a reference the letter D, F, G, or NT (no label) and analyzing the price premium generated from one housing property to another with a distinct qualification, be they individual (A, B, C, D, E, F, or G) or grouped letters (AB, ABC, EFG, or FG), see Figure 15.   In Figure 14, when using the letter groupings as a reference, the most efficient energy letter qualifications ABC and ABCD have a price premium that is 5.92% and 5.40%, respectively, as compared to the less efficient groupings (DEFG and EFG).
When using the letter D as a reference (Figure 15a), the results are as expected: An increase in price premium as a result of a better energy qualification. That is, for two homes with equal conditions, one having a D qualification and the other with an A qualification, the latter will have a 9.90% higher sales price.
On the other hand, when graphically analyzing the sales premiums using the letter F as the reference (Figure 15b), certain incoherencies are observed. Housing with lower qualifications (G) have an increase in sales price of 4.05%, similar to those of the properties with A qualifications, and greater than those having an E or F qualification. This trend continues to appear when the reference qualification is the letter G (Figure 15c), where the housing qualified with a letter C and E have similar premiums at 3.26% and 3.03%, respectively, but housing with a grouped ABC qualification had a negative premium of 6.3%, suggesting that houses with the lowest qualification (G) were valued higher. Finally, if the reference is non-qualified properties (Figure 15d), only housing with very high qualifications (A, B, or AB) were found to have positive premiums, with the rest having negative premiums.   . Qualification letter A with 5 cases to obtain a mean premium of 15.70% when the reference letter is G.

Analysis-1
If collating the results obtained in the model of housing with an EPC as compared to housing that is not qualified, it is observed that in Europe (where energy qualification is mandatory), there are more homogenous price premiums, with these results being in line with other studies [25,26,[34][35][36]75]. However, in America, due to the greater number of labels, there is also a greater variability in the results, obtaining a range that is between −2.49% and 14.3%. The lowest values are those of [32,67,87], while the highest are [38,55,62,69]. To avoid this variability, it is recommended that mandatory labels be used in both America and Asia, as suggested by Fizaine et al. [22]. Ankamah-Yeboah and Rehdanz [19] believe that greater premiums are obtained in the voluntary labels, since they consider that these are more highly valued than the mandatory ones, but they advocate policies that implement mandatory labeling, since it is understood that voluntary labeling tends to lose value over time.
If comparing the overall combined effect (4.20%) obtained in this document with similar studies, it is found that the first meta-analysis in 2014 obtained a premium of 7.6% [19], and the second in 2016 obtained 4.3% [20], while the third in 2018 had a range of values from 3.5-4.5% [22]. It is observed 15

Analysis-1
If collating the results obtained in the model of housing with an EPC as compared to housing that is not qualified, it is observed that in Europe (where energy qualification is mandatory), there are more homogenous price premiums, with these results being in line with other studies [25,26,[34][35][36]75]. However, in America, due to the greater number of labels, there is also a greater variability in the results, obtaining a range that is between −2.49% and 14.3%. The lowest values are those of [32,67,87], while the highest are [38,55,62,69]. To avoid this variability, it is recommended that mandatory labels be used in both America and Asia, as suggested by Fizaine et al. [22]. Ankamah-Yeboah and Rehdanz [19] believe that greater premiums are obtained in the voluntary labels, since they consider that these are more highly valued than the mandatory ones, but they advocate policies that implement mandatory labeling, since it is understood that voluntary labeling tends to lose value over time.
If comparing the overall combined effect (4.20%) obtained in this document with similar studies, it is found that the first meta-analysis in 2014 obtained a premium of 7.6% [19], and the second in 2016 obtained 4.3% [20], while the third in 2018 had a range of values from 3.5-4.5% [22]. It is observed that the premium obtained in this study is coherent with other studies and the mean effect of the premium has stabilized.
For the meta-regression, the option adopted with the REML estimator is considered valid (in accordance with the cited literature) although its explanatory power is low at 27.51%. The proposed model cannot explain all of the variation existing in the data, as noted by Nelson and Kennedy [115].
Another problem found in this research is the lack of data in the existing studies, which complicates and restricts the use of the meta-analysis, as indicated by authors such as [22] and [115].

Analysis-2
As for the results to quantify the premium, which means changing from one value to another within the ABCDEFG qualification scale, a unique value may not be given, since the data available are based on distinct scenarios and distinct reference bases have been used to measure the impact on the EPC price. The results are heterogeneous and do not include sufficient information to offer conclusive results. Although a specific value cannot be given, there is a clear trend for the high qualifications (A, B, or C) to have greater price premiums. Even with a reduced sample, the graphs appear to reveal that the absence of information in the energy label favors sale of housing with poorer qualifications, as suggested by Marmolejo Duarte [78] in Spain.
Therefore, it is recommended that in future studies, qualification letter groupings should not be used and the reference qualification should always be the same (recommending the letter D), as suggested by Fizaine et al. [22].

Conclusions
This study analyzed two issues: (1) Analysis-1 (A1): Quantify the price premium of housing with an EPC as compared to those without this qualification; and (2) Analysis-2 (A2): In housing having an EPC, quantify the premium resulting from changing from one qualification to another, within the analyzed scale.
Having completed a thorough literature review from recent years, including 96 documents, certain criteria were adopted for admission of these documents, classifying the data based on analysis type to be carried out, eliminating abnormal data and obtaining a final sample of 66 documents forming a total of 58 and 73 registers for Analysis-1 and Analysis-2, respectively.
For Analysis-1, descriptive statistics were used, comparing the normality and homoscedasticity of the registers. To avoid publication bias, documents were collected from diverse sources. To evaluate whether or not the final sample was heterogeneous and if there was publication bias, an improved funnel plot was created, along with a Baujat plot and a sensitivity analysis, and thus permitting to identify that heterogeneity results from the same data, which is corroborated via three meta-analyses (by publication type, by data period, and continent) and a meta-regression.
In Analysis-2, bar graphs were made with the mean of the values registered based on the analyzed classification and the reference classification. The heterogeneity in the reference letter used by distinct studies hinders their comparison.
Based on all of the analyses carried out, the following conclusions may be reached:

1.
Housing with an EPC has an overall combined effect on the sales price premium of 4.20% more than housing of similar characteristics that does not have this qualification; 2.
The housing location and the type of EPC condition the value of the premium, with significant differences existing between the continents that were analyzed, mainly America and Asia, as compared to Europe. It has been estimated that the highest premiums are found in America at 5.36% and in Asia with 4.81%, while in Europe they are 2.32%; 3.
That of the data obtained in the analyzed documents, a meta-regression was conducted with various estimators, considering, as in the literature, that the REML is the most appropriate. It is observed that the variable having the greatest influence on the price premium is type of energy qualification (Qualif_ABCDEFG), with a decrease of 5.4% (B = −0.054) in the EPC with ABCDEFG qualifications as compared to other qualification types, as a result of the previous conclusion; 4.
That in housing with ABCDEFG qualifications, where the premium is analyzed upon changing from one value to another within the scale, the results are not conclusive, but they do suggest a trend, with the highest qualifications having higher premiums.
This document is useful in order to understand the current behavior on a global level. However, it has certain limitations due to combining data from distinct studies that are influenced by geographic area, type of qualification used, etc. Therefore, the results should be considered within the context of the analyzed documents and not as evidence of causality.
Furthering this line of knowledge is necessary and essential, so that discrimination between more and less efficient housing takes place in the market functioning (through prices). The price differential found in these studies suggests a major incentive to investment in energy efficiency, which, along with suitable policies, may contribute to eventually ensuring the commitments that these countries have made.
This review identified specific problems in the existing literature. Hopefully, these results will encourage researchers to use their own judgment as to the type of letter to be used as a reference, and to include all necessary data in order to replicate the study.