Relationship between Project Space Types, Optimize Energy Performance Credit, and Project Size in LEED-NC Version 4 (v4) Projects: A Case Study

: A previous study (Pushkar 2021) showed a signiﬁcant non-parametric correlation between the Optimize Energy Performance credit from the energy and atmosphere category (EAc7) and project size in Leadership in Energy and Environmental Design for New Construction and Major Renovations version 4 (LEED-NC v4) ofﬁce space projects. However, in such an empirical analysis, there are at least two problems: the choice between parametric and non-parametric statistics, and the choice between the Wilcoxon–Mann–Whitney (WMW) and analysis of covariance (ANCOVA) non-parametric tests. This study aims to evaluate EAc7 credit achievement for different types of spaces in LEED-NC v4 projects. We show that, in order to evaluate the differences between two LEED data groups, (1) a non-parametric procedure is preferable to a parametric one, and (2) if there is Spearman’s correlation between EAc7 credit achievement and project size for the same LEED-NC v4 project, then Quade’s ANCOVA is preferable to WMW. The results of these tests show that, in two out of three cases, ofﬁce space projects had better EAc7 credit achievement than higher-education spaces at the gold certiﬁcation level and public assembly spaces at the silver certiﬁcation level.


Introduction
As Leadership in Energy and Environmental Design (LEED) certification has grown in popularity, numerous empirical analyses of LEED-certified projects have been published. This is because the LEED certification cannot be unified for different countries, certification levels, project types, or project sizes. Therefore, it is important to explore these factors. Such research can help LEED managers to determine the most appropriate certification strategy, taking into account their usually limited project-related budget and schedule [1]. However, these analyses require the use of the correct statistical tests, otherwise, the risk of reaching the wrong conclusion seems inevitable.
It was previously shown [5] that, in LEED-NC v3 2009 gold projects, the EAc1 credit was significantly higher in Finland and Sweden than in Turkey and Spain. It is possible that the reason for the different values of this credit is the significant differences in LEED-NC v3 2009 gold-type space projects in Finland and Sweden compared to Turkey and Spain [5].
Wu et al. [6] studied LEED for Neighborhood Development (LEED-ND) projects. They showed a significant non-parametric Spearman's correlation between the achievement of 11 individual credits and total credits when transitioning between levels of certification. However, when they used the Kruskal-Wallis test to evaluate the statistical differences among the certified, silver, gold, and platinum groups, they showed that only 5 of 11 credits had p-values smaller than 0.05. In this context, we assumed that if there is a significant relationship between the achievement of individual LEED-ND credits and the overall LEED-ND score-i.e., the problem of the part-to-whole ratio-then ignoring this relationship could skew the results of the Kruskal-Wallis test. We also assumed that, so as to avoid the influence of correlation between individual credit achievements and total score in LEED-ND projects on the statistical differences among certified, silver, gold, and platinum groups, one-way non-parametric analysis of covariance (ANCOVA) can be used. Fisher [7] reported on applying ANCOVA to address the "problems concerned with the relation of a part to the whole".
A significant relationship between EAc7 credit achievement and project size in LEED-NC v4 office-type projects at both the silver and gold certification levels was shown previously [8]. Therefore, when it is necessary to statistically determine, for example, the cross-certification performance between LEED-NC v4 certification levels, or between two types of spaces in LEED-NC v4 projects, the impact of project size on EAc7 credit achievement should be considered using ANCOVA. Therefore, to date, the use of ANCOVA to determine the differences between two or more groups of LEED projects when there is a correlation between the studied credit achievement and the project size from the same LEED project in each group has not yet been studied. The purpose of this study was to evaluate EAc7 credit achievement for different types of spaces in LEED-NC v4 projects.

Sample Size Assumption
To compare the differences between two independent groups-i.e., two types of spaces in LEED-NC v4 projects-we used the following parametric and non-parametric procedures: Student's t-test versus exact Wilcoxon-Mann-Whitney (WMW) test [9], and Fisher's versus Quaid's ANCOVA [10,11]. Pearson's versus Spearman's correlation tests were used as assumptions to transit from Student's/WMW tests to Fisher/Quaid ANCOVA tests when two independent groups were compared. Therefore, the minimum sample size (n) for the current study was determined based on the minimum sample size for conducting non-parametric tests [12]. According to Mundry and Fischer [13], for the exact WMW test, n ≥ 16, while for Spearman's correlation n ≥ 20. Regarding non-parametric ANCOVA, Vickers [14] noted that "there is no simple relationship between relative power and sample size". Therefore, based on the minimal sample for Spearman's correlation, n = 20, we assumed that if n ≥ 21 in each group, then such a sample size can be acceptable for non-parametric ANCOVA.

Design of the Study
For a dose-response study, it is necessary to minimize the influence of uncontrolled factors on the object of study [15]. To accomplish this, we collected LEED-NC v4 projects in only one region-the US-sorted by type of space and level of certification. Large LEED-NC v4 projects in one group that did not appear in the other group were excluded from the analysis. Achievements of the LEED-NC v4 Optimize Energy Performance credit and project size were analyzed.

Data Collection
The US Green Building Council (USGBC) website [16] and the Green Building Information Gateway (GBIG) website [17] were used to collect US-LEED-NC v4 projects with different types of spaces. Table 1 shows that US-LEED-NC v4 includes 18 types of spaces and 4 levels of certification. In the present study, we focused on EAc7 credit achievement. Based on [8], LEED-NC v4 projects with a sample size of n = 21 or more were included in the statistical analysis, as shown in Table 1. Three LEED-NC v4 case projects were collected-Case 1: office space group 1.1 of n 1.1 = 32 versus public assembly space group 1.2 of n 1.2 = 21 at the certified level; Case 2: office space group 2.1 of n 2.1 = 44 versus higher-education space group 2.2 of n 2.2 = 26 at the gold level; and Case 3: office space group 3.1 of n 3.1 = 35 and public assembly space group 3.2 of n 3.2 = 25 at the silver level.

Corrected Sample Size
To avoid the influence of the extremely large LEED project size on EA credit achievement, in Case 1, from 32 projects of LEED-NC v4 certified office space projects, 1 project was excluded from the analysis (≥40,521 m 2 ); in Case 2, from 44 projects of gold office space projects, 4 projects were excluded from the analysis (≥40,818 m 2 ); and in Case 3, from 35 silver office space projects, 1 project was excluded from the analysis (≥65,497 m 2 ). Thus, for Cases 1, 2, and 3, the maximum project sizes were 23,226, 23,783, and 23,094 m 2 , respectively. A summary of the US-LEED-NC v4 projects is given in Table 2.
In the present study, the following parametric and non-parametric statistical procedures were used: descriptive statistics, inferential statistics and assumptions, effect size, and p-value interpretations. The p-values are interpreted in Section 2.7 of this article. LEED-NC v4 EAc7 credit achievement was noted as response variable Y, while project size was noted as covariate X. Variable Y, which is related to ordinal scale, and covariate X, which is related to interval scale, refer to the same LEED-NC v4 project. We used the means of response variable Y ± standard deviation (SD) and medians of response variable Y ± interquartile range (IQR, 25th-75th percentile) to describe parametric and non-parametric statistics, respectively.

Inferential Statistics
An unpaired two-tailed parametric Student's t-test (t-test) and exact WMW test were performed to evaluate the differences between two unpaired groups. According to Bergmann et al. [9], if data contain ordinal or discrete interval variables with relatively few values, an exact WMW procedure-not an approximate one-should be used.
Parametric Pearson's correlation and non-parametric Spearman's correlation were used to evaluate the association between EAc7 credit achievement and project size for the same LEED-NC v4 projects.
Parametric Fisher's ANCOVA [10] and non-parametric Quade's ANCOVA [11] were used to evaluate the differences between two groups with respect to response variable Y when covariate X was taken into account [18].

Checking Assumptions When Using Parametric Tests
A prerequisite for using parametric statistics is that certain assumptions must be met. In general, if these assumptions do not hold, non-parametric statistics should be used [19].

Statistical Tests for Unpaired Groups and Assumptions
To perform a t-test, the following assumptions are made: assumption of normality of variables (Shapiro-Wilk test), equal variability assumption (Fisher-Snedecor F-test), and Satterthwaite's approximate t-test if variances are not assumed to be equal.

Correlation Procedures and Gauss-Markov Assumptions
According to Boldina and Beninger [20], in order to determine Pearson's correlation, five Gauss-Markov assumptions should be met: (1) linearity in form, as determined by visual inspection; (2) correlation between residuals and independent variables, evaluated by ordinary least squares; (3) autocorrelation, evaluated by the Durbin-Watson test; (4) homoscedasticity, evaluated by the Breusch-Pagan test; and (5) normality in residuals, evaluated by the Shapiro-Wilk test.
Assumptions for Using Fisher's ANOVA Five Gauss-Markov assumptions were used [20]. In addition, the non-parametric triples test was used to evaluate the asymmetry assumption [21], and the Fisher-Snedecor F-test was used to evaluate the homogeneity assumption of variance [22].

Effect Size Interpretation
Parametric Cohen's d with bias correction was used to measure the effect size of the difference between two means [23]. Cohen's d can take on any number between 0 and infinity. The effect size is considered to be negligible if d < 0.20, small if 0.20 ≤ d < 0.50, medium if 0.50 ≤ d < 0.80, and large if d ≥ 0.80 [24].

p-Value Interpretation
As recommended by Hurlbert and Lombardi [27], the paleo-Fisherian and Neyman-Pearsonian paradigms (null hypothesis significance tests (NHSTs)) were replaced by neo-Fisherian significance assessments (NFSAs). According to Hurlbert and Lombardi [27], the NFSA (1) does not fix α, (2) does not describe p-values as significant or non-significant, (3) does not accept the null hypothesis based on high p-values, but only suspends judgment, (4) interprets significance tests according to 3-valued logic, and (5) presents effect size information if necessary.
Hurlbert and Lombardi [27] cite a recommendation by Gotelli and Ellison [28], noting that "in many cases, it may be more important to report the exact p-value and let the readers decide for themselves how important the results are". According to Beninger et al. [29], the logic of Occam's razor should not be used for universal interpretation of p-values.
Hurlbert and Lombardi [27] clearly showed that fixing α (the level of significance; e.g., α = 0.05) and dichotomizing the scale of p-values (i.e., p < α or p > α) are superfluous. They cited Fischer's philosophical proposal that "no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects (null) hypotheses; he rather gives his mind to each particular case in light of his evidence and ideas" [30]. In addition, Altman [19] noted that "It is ridiculous to interpret the results of a study differently according to whether the p-value obtained was, say, 0.055 or 0.045. These p-values should lead to very similar conclusions, not diametrically opposed ones".
According to Hurlbert and Lombardi [27], for NFSAs, precise p-values are evaluated and shown according to a three-valued logic: seems to be positive (i.e., there seems to be a difference between group 1 and group 2), seems to be negative (i.e., there does not seem to be a difference between the groups), or judgment is suspended regarding the difference between groups 1 and 2.

Visual Analysis
Visual analysis of all three cases in Figure 1 shows that a larger LEED-NC v4 project size is associated with a monotonic decrease in the EAc7 credit.
Thus, project size might be considered as an influential factor when comparing different space types with regard to this credit. Section 3.2 presents a detailed statistical analysis of this issue for the three studied cases.
This correlation between EAc7 credit achievement and project size in LEED-NC v4 projects was first revealed in [8]. In that study, it was noted that the types of envelope construction and heating, ventilation, and air-conditioning (HVAC) equipment installed are different for small and large office spaces. In particular, it is recommended that small offices be built with mass walls and an average window-to-wall ratio of 19%, while large offices should be built with a steel frame or mass walls and an average window-to-wall ratio of 54% [31]. In small offices, flexible refrigerant-based packaged air-conditioning units for air cooling and heating are usually installed, while in large offices, centralized water-based air-cooling chillers and air-heating boilers are implemented [32]. Thus, as was concluded in [8], smaller offices are more likely to have more insulation and to be more effectively equipped with HVAC systems to save cooling and heating energy and obtain more points for the EAc7. In the present study, this correlation between EAc7 credit achievement and size in LEED-NC v4 projects was confirmed for office spaces and expanded for higher-education and public assembly spaces. This highlights the importance of taking this correlation into account when distinguishing between LEED-NC v4 projects based on the type of space.

Case 1
In Case 1, we analyzed the differences in EAc7 credit achievement between office space group 1.1 and public assembly space group 1.2 at the LEED-NC certified level. The comparisons of the EAc7 performance of groups 1.1 and 1.2 without and with considering project size are presented in Tables 3 and 4, respectively.  p-values were evaluated according to three-valued logic; a Roman font indicates that the value seems to be negative; b italic font indicates that judgment is suspended. Table 3 shows that the assumption of normality was met in both groups (p = 0.5449 and p = 0.8551, respectively). In this context, t-tests and exact WMW tests showed similar results: the difference between groups 1.1 and 1.2 seemed to be negative (p = 0.6807 and p = 0.7561, respectively), and the effect size in both tests was negligible (Cohen's d = 0.11 and Cliff's δ = 0.05).
It was shown that four Gauss-Markov assumptions and two Fisher's ANCOVA assumptions (0.0795 ≤ p ≤ 1.0000) were met in groups 1.1 and 1.2 (Appendix A, Table A1). As can be seen in Table 4, despite the fact that the above assumptions were met, in group 1.1, Pearson's and Spearman's correlations showed similar results-suspended judgment regarding the correlation between EAc7 credit achievement and project size (p = 0.0518 and p = 0.0465, respectively)-while in group 1.2, Pearson's and Spearman's correlations showed similar results, appearing to be negative (p = 0.0784 and p = 0.4244, respectively). As we can see in Table 4, Fisher's and Quade's ANCOVA tests showed similar results: the difference between groups 1.1 and 1.2 seemed to be negative (p = 0.7958 and p = 0.5520, respectively).

Case 2
In Case 2, we analyzed the differences in EAc7 credit achievement between office space group 2.1 and higher-education space group 2.2 at the LEED-NC gold level. Comparisons of the EAc7 performance of the two groups without and with considering project size are presented in Tables 5 and 6, respectively. p-values were evaluated according to three-valued logic; a bold font indicates that the value seems to be positive; b Roman font indicates that the value seems to be negative. p-values were evaluated according to three-valued logic; a bold font indicates that the value seems to be positive; b Roman font indicates that the value seems to be negative. Table 5 shows that the assumption of normality was not met for group 2.1 (p = 0.00003) but was met for group 2.2 (p = 0.0929). Although the t-test was not recommended in this context, t-tests and exact WMW tests showed similar results: the difference between groups 2.1 and 2.2 was positive (p = 0.0013 and p = 0.0011, respectively). In this case, the effect size was large in both tests (Cohen's d = 0.83 and Cliff's δ = 0.47, respectively).
As can be seen in Table 6, although Pearson's correlation and Fisher's ANCOVA tests were not recommended in this context, Pearson's and Spearman's tests showed similar results: in group 2.1, the correlation between EAc7 credit achievement and project size seemed to be positive (p = 0.000001 and p = 0.00001, respectively), while in group 2.2 it seemed to be negative (p = 0.7249 and p = 0.4236, respectively), and Fisher's and Quade's ANCOVA tests seemed to show positive results: p = 0.0122 and p = 0.0005, respectively.

Case 3
In Case 3, we analyzed the differences in EAc7 achievement between office space group 3.1 and public assembly space group 3.2 at the LEED-NC silver level. The comparisons of the EAc7 performance of the two groups without and with considering project size are presented in Tables 7 and 8, respectively.  p-values were evaluated according to three-valued logic; a bold font indicates that the value seems to be positive; b Roman font indicates that the value seems to be negative. p-values were evaluated according to three-valued logic; a bold font indicates that the value seems to be positive; b Roman font indicates that the value seems to be negative.
As can be seen in Table 7, in one of the groups, the assumption of normality was not met (p = 0.0384). As a result, if the t-test was applied, the statistical difference between the two groups was p = 0.141, with a small effect size (Cohen's d = 0.38), while if the exact WMW test was applied, the difference between the two groups was p = 0.0863, with a small effect size (Cliff's δ = 0.26); that is, in both tests, the difference between the two groups seemed to be negative.
It was shown that in group 3.1, the assumption of homoscedasticity was not met (p = 0.0184), while in group 3.2, four Gauss-Markov and two Fisher ANCOVA assumptions were met (0.0685 ≤ p ≤ 1.000) (Appendix A, Table A3). Table 8 shows the results of parametric and non-parametric tests-Pearson's correlation and Fisher's ANCOVA results versus Spearman's correlation and Quade's ANCOVA results. The Pearson's test results show that the correlation between EAc7 credit achievement and project size in both groups 3.1 and 3.2 seems to be negative (p = 0.1974 and p = 0.1696, respectively), and the Fisher's ANCOVA results show that the difference between the two groups seems to be negative (p = 0.7306). In contrast, the Spearman's test results show that the correlation between EAc7 credit achievement and project size in both groups 3.1 and 3.2 seems to be positive (p = 0.0254 and p = 0.0293, respectively), and the Quade's ANCOVA results show that the difference between the two groups seems to be positive (p = 0.0162).
In this study, we compared the results of parametric and non-parametric statistics. As a result, Cases 2 and 3 showed that ignoring assumptions when using parametric statistics can result in erroneous statistical conclusions.
We showed that using the non-parametric ANCOVA test instead of the non-parametric WMW test may be more efficient for detecting differences between two independent groups with LEED-NC v4 data if there is a significant Spearman's correlation between the EAc7 credit achievement and LEED-NC v4 project size in each group. Vickers [14] showed similar results for the treatment of abnormally distributed data.
In addition, three cases suggest that the use of ANCOVA could lead to new knowledge in green building rating systems:

•
In a study by Pushkar and Verbitsky [37], LEED projects from eight US states were relocated from environmental categories to the building layer (BL) and service layer (SL). Spearman's test results showed a significant and negative correlation between the BL and SL. Three design strategies were identified: BL-emphasized, SL-emphasized, and random [37]. However, this study did not assess the differences between US states. In this context, non-parametric Quade's ANCOVA can be useful for the statistical measurement of differences between US states. Pearson's test results showed a significant and large positive correlation between the total scores of LEED-NC v4 and BREEAM [38]. It should be noted that in a number of European countries, the LEED and BREEAM green rating systems are used in parallel for building certification. In this context, the differences between countries can be evaluated using non-parametric Quaid's ANCOVA.

•
ElBatran and Ismaeel [39], among others, studied the relationship between spatial daylight autonomy (sDA) and annual sunlight exposure (ASE) in office buildings with a double-skin façade. Using a parametric Pearson's test, they showed that the relationship between sDA and ASE had a significant positive correlation (p < 0.001). However, in this context, the differences between double-skin façade alternatives can be evaluated using Fisher's ANCOVA.

Conclusions
• To evaluate LEED data, the use of non-parametric significance tests is preferred over parametric significance tests.

•
To compare LEED data from two or more groups, it is necessary to test for logically possible correlations between variables in each group, and if a non-parametric correlation exists, then Quade's ANCOVA is preferable to the exact WMW test.

•
In two out of three cases, the EAc7 (Optimize Energy Performance) credit achievement in LEED-NC v4 projects was better in office-type projects than in higher-education-type projects at the gold level of certification, and in office-type projects than in public-type projects at the silver level.
Funding: This research received no external funding.

Data Availability Statement:
Publicly available datasets were analyzed in this study. The data can be found here: https://www.usgbc.org/projects (USGBC Projects Site) (accessed on 10 April 2022) and http://www.gbig.org (GBIG Green Building Data) (accessed on 10 April 2022).

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A  p-values were evaluated according to three-valued logic; a bold font indicates that the value seems to be positive; b Roman font indicates that the value seems to be negative. p-values were evaluated according to three-valued logic; a bold font indicates that the value seems to be positive; b Roman font indicates that the value seems to be negative.