Key Performance Indicators for Evaluation of Commercial Building Retroﬁts: Shortlisting via an Industry Survey

: Key performance indicators (KPIs) are quintessentially useful for performance evaluation, but a set of pragmatic KPIs for holistic evaluation of retroﬁts for commercial buildings is hitherto unavailable. This study was conducted to address this issue. Built upon the ﬁndings of a systematic literature review and a focus group meeting in the earlier stages of the study, a questionnaire survey covering 19 KPIs for environmental (embracing energy), economic, health and safety, and users’ perspective evaluations of building retroﬁts was developed. Data of the survey, collected from facility management (FM) practitioners in Hong Kong, underwent a series of statistical analyses, including Kruskal–Wallis H test, Mann–Whitney U test, and Spearman Rank Correlation. The analysis results revealed the levels of importance of KPIs perceived by different groups of FM practitioners and the rankings of KPIs. Based upon these results, eight KPIs were shortlisted, which are energy savings, payback period, investment cost, actual-to-target ratio of the number of statutory orders removed, actual-to-target ratio of the number of accidents reduced, target indoor air temperature, target indoor air quality (IAQ) class, and target workplane illuminance. These KPIs serve as keystones for further development of an analytic evaluation scheme for commercial building retroﬁt performance assessment. The methodology of this study can also serve as a reference for similar KPI studies in other research domains.


Introduction
Buildings account for 39% of all carbon emissions in the world [1]. In Hong Konga city famous for its dense population and buildings-the volume of aged buildings is large and keeps increasing. As retrofitting those existing buildings is a sustainability goal that the international society endeavours to meet, the building industry and the government of Hong Kong have introduced various incentives that motivate building owners or operators to implement retrofits for the premises they own or manage. However, the retrofit rate of existing buildings remains low [2]. The building sector in the city is facing challenges to retrofitting the existing buildings, especially the aged buildings. One of the key challenges is the estimation of the benefits brought by the building retrofits, which relies on scientific evaluation mechanisms to evaluate the building retrofit performance against the economic input [3,4]. In the evaluation process, human decisions or judgements from owners, operators, occupants, etc. are critical elements.
Facility management (FM) practitioners are building professionals who are involved in multiple disciplines of practice to ensure the functionality, comfort, safety, and efficiency of facilities in the built environment. Their knowledge and work experience are gained through intensive interactions with the operations of the existing buildings they manage. Thus, their opinions on the change in the buildings' conditions and the best option for

Performance Measurement
In the past decade, a considerable number of studies have been conducted in evaluating the performance of building retrofits. Decision making is a prevailing stream of studies in the field of building retrofits. Economic viability is an indicator frequently used for building retrofit performance measurement. Net present value (NPV), internal rate of return (IRR), overall rate of return (ORR), benefit-cost ratio (BCR), discounted payback period (DPP), and simple payback period (SPP) are often used to assess the economic feasibility of a single retrofit measure [15,[17][18][19][20]. Surveys on users' satisfaction or feedback from stakeholders in the post-occupancy phase were also used to measure building retrofit performances [21,22].
On top of that, the KPI approach is regarded to be one of the most popular and valuable tools for measuring the process or outcome of construction projects. KPIs are a collection of indicators that can comprehensively reflect a project's goals. They help to define the nature, scope, expected quality, and unique characteristics of the projects and can also provide means for measuring the 'progress towards those goals for further learning and improvement' [23]. Energy performance and energy saving are two common KPIs for measuring the financial and environmental benefits of building retrofits [24][25][26][27][28]. As sustainability is one of the key project goals, an increasing volume of studies in the literature have examined KPIs for measuring the level of sustainability in construction and building renovation/retrofit projects. Kylili et al. [23] provided a state-of-the-art review on the KPIs identified for measuring the sustainability of the projects in the built environment, in which they categorised the building performance KPIs into eight groupsnamely, economic, environmental, social, technical, time, quality, disputes, and project administration. Al Dakheel et al. [29] conducted a review on features of smart buildings (SBs) and identified 10 KPIs for SBs. The KPIs they identified help to quantify the 'smart features' of SBs and reflect the 'smart capability' of the building. The validity of KPIs affects the overall measurement results; thus, the selection process of KPIs should engage scientific methodologies to ensure their representativeness of the measurement goals [30]. Industry experts' involvement is regarded as one of the reliable approaches for identifying representative and valid KPIs. This approach usually entails three steps: (1) interviews with experts to define the measurement goals and identify KPIs that fit those goals; (2) a survey to collect a wider scale of data from various groups of experts; (3) statistical analyses to confirm and verify the identified KPIs from the previous steps. Xu et al. [31] followed these three steps to identify KPIs for the sustainability of building energy efficiency retrofit (BEER) in hotel buildings in China. Lai et al. [32] used the same approach to investigate KPIs for measuring the performance of hospital facilities management. This study, likewise, adopted this approach to identify and verify KPIs for the evaluation of commercial building retrofits.

The Role of FM Practitioners
A building retrofit project usually comprises five major phases: project set up and pre-retrofit survey, performance assessment, identification of retrofit options, site implementation and commission, and validation and verification [15]. Completion of a retrofitting project requires a team of building experts to assess the existing building conditions, design the retrofitting strategies, monitor the retrofitting process, and review the project outcome [33]. In this process, FM practitioners deal with daily building management activities at the operational level and are involved in developing cost-effective plans to support built asset management at the strategic level. For example, FM managers are engaged in a company's corporate social responsibility strategy development through evaluating the facility performance of the property portfolio. They are responsible for providing advice to top management on green certification decisions and participate in obtaining green certification. Building retrofit is an inevitable activity that leads to green certification for existing buildings.
Responsible for managing both buildings and the relevant stakeholders (e.g., owners, occupants, tenants), FM practitioners have to communicate with retrofitting decision makers and facility users, modify facilities, upgrade systems for energy use, and develop mechanisms for measuring energy consumption, monitoring energy use process, and assessing energy performance [34,35]. Thus, FM practitioners play critical roles in supporting decision making on building retrofits, and their opinions on KPIs for building retrofit performance are useful.

Data Collection
In the preliminary stages of this study, 52 performance indicators for building retrofit performance assessment were identified through the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) process and these indicators were grouped into four categories ('economic', 'environmental', 'health and safety', and 'users' perspective') with their detailed meanings provided [6]. With these indicators identified and in order to establish a method that can evaluate the holistic performance of commercial building retrofits, a focus group study was then conducted. Grounded upon the deliberations and opinions of the focus group study, 19 KPIs were selected as useful for building retrofit performance assessment (Table 1). Since using 19 KPIs entails a considerable effort to collect the associated empirical FM data, and the process for calculating this large number of KPIs is time consuming [36][37][38], a questionnaire survey was designed to shortlist indicators that are useful for reflecting the performance of commercial building retrofits. The questionnaire consists of three parts. Part 1 collects respondents' personal information, including gender, years of work experience, job level, nature of their organisation, type of employer, and their academic qualification. These pieces of information served to reflect the backgrounds of the respondents, allowing inter-group comparisons to be made when analysing the survey findings. Part 2 solicits the importance ratings of the 19 KPIs on a five-point scale (1: very low; 2: low; 3: moderate; 4: high; and 5: very high). Part 3 asks the participants to suggest any other KPIs they consider important and any other comments they have based on their experience. Pilot tests, with the participation of five FM experts, were conducted on the questionnaire. These tests helped to detect and eliminate any potential error or misunderstanding of the questions in the survey. Feedbacks from the tests were taken to finalise the questionnaire before its official distribution. The industry-wide online survey was officially launched in two ways: snowballing and mass email. Using a snowballing approach, FM professionals who participated in the preceding focus group study [6] and pilot tests were invited to complete the survey and also distribute it to their colleagues. As regards the second approach, mass email, a hyperlink to the survey was emailed to the members of the Building Services Operation and Maintenance Executives Society (BSOMES)-the leading professional body in Hong Kong specialised in technical FM works embracing building retrofits. In order to increase the level of representativeness of the samples, FM practitioners with different organisational natures (government, non-governmental organisation (NGO), and private company) and types (e.g., owner/developer, management company, and contractor) working at different levels (strategic (e.g., director, chief engineer), tactical (e.g., manager, engineer)) were invited to participate in the survey. The demographic details of the survey respondents are shown in Table 2. A total of 164 responses to the survey were received. To ensure data quality, the responses were screened manually, and those with incomplete information provided were discarded. This resulted in having 124 responses qualified for the subsequent data analysis. Among these responses, 83.9% were from males. The majority of the respondents were highly experienced; most were employed by private companies. The proportions of those working for owners/developers and management companies were comparable, while those working for contractors amounted to 14.5%. When comparing the strategic and tactical groups, the latter prevails. More than three-quarters of the participants have worked on office buildings; nearly half have worked on retail premises. The respondents were well educated, with most of them possessing a degree at the bachelor level or above.

Statistical Analysis
The data collected were analysed using the SPSS version 26.0 software. To investigate any differences between different groups of the responses, the respondents were stratified into six groups. Each of these groups was further categorised into subgroups ('n' denotes the number of samples), as shown below: First, group analyses were conducted using Kruskal-Wallis H test (H) and Mann-Whitney U test (U) to analyse whether the respondents perceived the importance levels of the KPIs differently. H test, a non-parametric test that compares more than two independent or unrelated samples [39], was applied to make comparisons between groups G2, G3, and G4. For each of the comparisons, a null hypothesis (H o ) and an alternative hypothesis (H a ) were set (H o : there is no tendency for ranks of groups of response to rank systematically higher or lower than those of the others; H a : there is a tendency for ranks of groups of response to rank systematically higher or lower for at least one of the groups). For testing these hypotheses, the Kruskal-Wallis H-test statistic was determined by Equation (1).
where N is the number of values from all combined samples, R i is the sum of the ranks from a particular sample, and n i is the number of values from the corresponding rank sum. The α value for the H test was set as 0.05. As H test does not identify where and the degree of the differences occurred, the U test, as a type of post hoc test, was used to analyse any significant differences between the sample pairs. Following the Bonferroni procedure which helps to compensate type I error, the adjusted α, α B , was used in the U test to determine any significant difference between samples if H test was found to be significant (e.g., for G2, k = 4) [39]. The adjusted α from the Bonferroni procedure was shown in Equation (2).
where α B is the adjusted level of risk, α is the original level of risk, and k is the number of comparisons.
For the U test, it is a non-parametric test that can be used to compare two unrelated or independent samples [39]. Therefore, it is suitable for use in inter-group comparisons between G1a and G1b, G5a and G5b, and G6a and G6b. The α value for the U test was set as 0.05. For each of the comparisons, a null hypothesis (H o ) and an alternative hypothesis (H a ) were set (H o : the mean ranks of the groups are the same; H a : the rank of one group of responses is systematically higher (or lower) than the other). Accordingly, the Mann-Whitney U-test statistic for each of the two samples was determined by Equation (3), and the smaller of the two U statistics was obtained.
where U i is the test statistic for the sample of interest, n i is the number of values from the sample of interest, n 1 is the number of values from the first sample, n 2 is the number of values from the second sample, and ΣR i is the sum of the ranks from the sample of interest. The mean and z-score for the Mann-Whitney U test for large samples were found by Equations (4) and (5).
where x U is the mean, S U is the standard deviation, and z is the z-score for a normal approximation of the data. Second, Spearman's rank correlation was applied to examine any significant difference in the KPIs rankings between pairs of the respondent subgroups. Spearman's rank correlation coefficient (r s ) was calculated by Equation (6) [34]: where n is the number of observations, and D i is the difference between ranks obtained from each pair of responses. For the value of r s , '+1' represents perfect agreement between the rankings; '0' represents no association between the rankings; '−1' represents perfect disagreement between the rankings. Finally, a mean score was calculated for each of the rated KPIs, based on which the overall ranking of the KPIs was determined to facilitate shortlisting the most essential KPIs.

Perceived Importance Levels of KPIs
Referring to the results of the H and U tests shown in Figures A1-A12 (see Appendix A), six significant observations are worth noting. First, significant difference (U = 701, p < 0.05) was found between male (mean rank = 59.24) and female (mean rank = 79.45) for KPI-11 (life cycle cost (USD)). It means that the female and male FM practitioners had different perceptions of the importance of life cycle cost for evaluating building retrofit performance. This finding echoes the argument of Rodríguez et al. [40] regarding managerial style: men and women have different managerial styles.
Second, significant difference (H = 10.538, p < 0.05) was found between respondents with different FM/O&M work experience for KPI-2 (normalised energy savings (kWh/m 2 year)). The results show that (U = 249, p < 0.0125) respondents with less work experience (≤5 years; mean rank = 39.18) considered KPI-2 as more important than the experienced practitioners (experience between 20 to 30 years, mean rank = 26.38) did, and similar findings were found between freshmen (≤5 years; mean rank = 41.59) and veterans (≥30 years; mean rank = 27.74) in ranking KPI-2 (U = 262, p < 0.0125). This may be because the experienced practitioners were aware that after years of building occupation with energy retrofits already undertaken, the room for further energy saving is limited. Yet, no major disagreement was found between the various respondent groups (with different work experiences) on KPI-1 'energy savings (%)'. According to Miller and Higgins [41], 'percentage better and percentage saved' was mostly referenced in environmental performance evaluation studies [41].
Third, a significant difference (H = 8.726, p < 0.05) was found between freshmen and non-freshman (>5 years) for KPI-13 (ratio of actual to target no. of statutory orders removed (%)). The results (U = 125.5, p < 0.0125) show that respondents with more work experience (mean rank = 27.02) considered KPI-13 as more important than the freshmen (mean rank = 17.20). The possible reason could be the freshmen may have relatively less work experience and may not have come across any retrofit projects with the requirement in statutory orders removal. Therefore, the freshmen were less concerned about this KPI.
The other three significant differences were found between respondents at the tactical level and strategic level for KPI-17 (target IAQ class; U = 1234, p < 0.05); KPI-18 (target workplane illuminance (lux); U = 1159, p < 0.05); KPI 19 (target indoor equivalent continuous weighted sound pressure level (dBA); U = 1239, p < 0.05). Respondents at the tactical level perceived the three KPIs (mean rank of KPI-17 = 67.15; mean rank of KPI-18 = 68.02; and mean rank of KPI-19 = 67.09) as more important than those at the strategic level did (mean rank of KPI-17 = 51.97; mean rank of KPI-18 = 50.00; mean rank of KPI-19 = 52.11). The reason could be that FM practitioners at the tactical level have to handle and resolve complaints from users about IAQ, workplane illuminance, noise, etc. before such problems escalate to the strategic level. Hence, FM practitioners at the tactical level, when compared with the strategic counterpart, are more concerned about the KPIs in the users' perspective aspect. Table 3 displays the values of the Spearman rank correlation coefficients between groups, while the detailed ranking results for different respondent groups are illustrated in Appendix B (Tables A1-A3). In general, no significant disagreement was found between the rankings of the KPIs pertinent to the various groups. A significant positive correlation (at the 0.01 level) existed in the rankings between some of the respondent groups, including female vs. male; experience (>5 to <20 years) vs. experience (20 to 30 years); experience (>5 to <20 years) vs. experience (>30 years); owner/developer vs. management company; and sub-degree or undergraduate degree vs. postgraduate degree. Significant positive correlations (at the 0.05 level) existed in the rankings between some of the respondent groups, including experience (20 to 30 years) vs. experience (>30 years); owner/developer vs. others; management company vs. contractors; contractors vs. others; strategic level vs. tactical level.

Importance Levels and Ranks of KPIs
Based on all the valid responses, a mean rating was calculated for each of the KPIs, and the calculation results in Table 4 show that the ratings ranged from 3.14 to 3.76. To shortlist the most important KPIs for pragmatic use in building retrofit performance evaluation, 3.45 was taken as the cut-off mean rating. This rating, being the mean between 3.14 and 3.76, represents a moderate-to-high importance level. Thus, a total of 13 KPIs, covering all the four performance aspects, were shortlisted (Table 5). ≥3.40 15 4 ≥3.45 (Moderate-to-high importance) 13 4 ≥3.50 6 3 ≥4 (High importance) 0 0

Finalised KPIs
The extraction of the most representative KPIs was based on two criteria: the rank of the KPI and the grouping category. Among the 13 KPIs, KPI-1, KPI-2, and KPI-3 can be used to indicate the energy-saving performance of a retrofit project. Among these three KPIs, KPI-1 was ranked the highest by the respondents, meaning that the practitioners regarded this KPI to be the most important, while KPI-2 and KPI-3 were only ranked the 12th and the 3rd, respectively. Therefore, KPI-1, which can cover the representations of KPI-2 or KPI-3, was taken for use.
KPI-9, KPI-10, and KPI-11 were related to the cost evaluation of a retrofit project. Thus, they can be grouped under one category. As KPI-9 was ranked the highest among the KPIs in this category, the other two KPIs were removed from the list. Additionally, when compared with KPI-9, KPI-11 (life cycle cost) is less feasible in practice because cost elements such as operating and maintenance costs, in the long run, could hardly be accurately determined at the time when a retrofit project is implemented [42].
KPI-16 and KPI-17 were related to IAQ. Despite their similar rankings (KPI-16: rank = 3 and KPI-17: rank = 5), KPI-17 (target IAQ class) covers 12 parameters for IAQ assessment and hence is more representative than a single parameter (∆ Indoor carbon dioxide levels or harmful substances (ppm) covered by KPI-16. The 12 parameters (with 10 chemical parameters) for IAQ assessment are carbon dioxide (CO 2 ) and other pollutants-namely, carbon monoxide (CO), respirable suspended particulates (PM 10 ), nitrogen dioxide (NO 2 ), formaldehyde (HCHO), total volatile organic compounds (TVOC), mould, radon, and airborne bacteria. The certification of IAQ class (Good Class or Excellent Class), administered by the Environmental Protection Department of the Hong Kong government [43], is also authoritative.
For KPI-15, it was only covered in the assessment of IAQ class for building projects completed before 2019. Thus, it was an independent indicator. Participants ranked KPI-16 (∆ indoor carbon dioxide levels or harmful substances (ppm): rank = 3) over KPI-15 (target indoor air temperature ( • C): rank: 6). The reason for this may be that occupants can adjust themselves (e.g., putting on or off their clothes) to suit the indoor thermal comfort condition, while they can hardly notice the concentration of the carbon dioxide or harmful substances, not to mention removing such substances. Participants may, therefore, perceive KPI-16 as more important than KPI-15 for building retrofits.
The rest of the original KPIs, i.e., KPI-6, KPI-13, KPI-14, KPI-15, and KPI-18, are independent indicators without overlaps. Thus, they were retained on the KPIs list. The original 13 KPIs and the final eight KPIs are shown in Table 6.  After the foregoing activities (literature review, focus group, and survey), the shortlisted KPIs, belonging to four different aspects (environmental, economic, health and safety, and users' perspective), were determined. Correspondingly, a hierarchy for the evaluation of building retrofit performance is depicted in Figure 1. After the foregoing activities (literature review, focus group, and survey), the shortlisted KPIs, belonging to four different aspects (environmental, economic, health and safety, and users' perspective), were determined. Correspondingly, a hierarchy for the evaluation of building retrofit performance is depicted in Figure 1.

Discussion
Part 3 of the survey, containing an open-ended question, asked the participants to provide any comments in relation to the survey topic. From the responses collected, two types of barriers to building retrofits were identified: (1) high evaluation cost and difficulty of obtaining precise cost estimation; (2) different natures of retrofit projects.

High Evaluation Cost and Difficulty of Obtaining Precise Cost Estimation
The core information to support building retrofit performance assessment is related to safety and proper working of the built assets, health and comfort, space functionality, and energy. Such information is usually gathered by FM managers who arrange maintenance works or technical inspections, or by users who report complaints and fill in satisfaction questionnaires. The main purpose of this is to help improve performance during the operational phase of a building [44]. However, systematic collection of all necessary data to support building retrofit performance evaluation is costly, as one of the survey respondents stated: ' . . . initial cost (for evaluating the building retrofit performance) can be high . . . ' Although data acquisition can be simple using modern and powerful computerised systems [45,46], data overload can be a problem when a sophisticated data mining algorithm is needed to obtain useful information [47]. Thus, whether collecting the data for performance evaluation is worthwhile is a common decision to make for FM managers [45]. If all the applicable KPIs identified from the literature are used to evaluate the performance of building retrofits, considerable effort and resources will be needed to obtain and process the data [37]. Kumar et al. [45] also considered that having many indicators was impractical, and indicators should be simple to allow performance benchmarking [45].
Moreover, it is difficult to accurately predict the energy consumption of commercial buildings. A survey respondent who worked for a private management company stated: 'It (building retrofit) cannot give the precise percentage as the change of the weather might cause the energy consumption to increase significantly. For the energy-saving aspect, we do keep at 2% per year depending on electricity side only.' As submeters are not commonly installed to monitor the energy consumption of different parts of building services systems [6]. The additional cost of installing submeters, for example for a centralised chiller plant being retrofitted, is usually high given the need to modify the relevant part of the existing system [48]. Therefore, measuring the actual energy saving with respect to the retrofitted portion of the centralised system could be difficult.

Different Natures of Retrofit Projects
It is common to upgrade the existing equipment at the end of the equipment life or when it comes to failure. Traditional retrofits practice focuses on replacing particular equipment such as chillers and lighting, instead of maximising overall building performance [49]. Additionally, the initial physical condition of the substituting equipment is often emphasised. This is consistent with the following opinion collected from the survey: 'Many retrofit projects initiation was based on the order of equipment or system end of life, no spare part support, change of use/demand, justifiable energy saving whereas functional or environmental enhancement are usually in the lowest priority.' The following statements from the survey respondents further indicated that the applicability of the KPIs in performance evaluation may vary with the nature and scope of the retrofit projects.
'The . . . answers (for KPIs) were generic, in fact, the scores should be depending on the nature of the retrofit project. ' 'Some KPIs may not be commonly used or considered by management, and some may not be applicable when designing the retrofit project, while some may be irrelevant to the reason for carrying out retrofit.' 'Special attention should be taken in case of the retrofit project carrying out phase by phase as the newly added and the existing system may be connected and worked together at the same time. Final commissioning of the whole system is necessary at the final stage of the project. ' From the managerial perspective, building performance depends on the resources (e.g., financial, technological, and labour) that are available and the quality of service that should be achieved [50]. In any case, facilities must be assessed with an organisation's goal and mission in place, and the assessment result should inform how well the facilities help the organisation meet its goal and fulfil the mission [51,52].

Conclusions
In this study, an industry-wide online survey was conducted to solicit the opinions of FM professionals on the KPIs that are applicable to commercial building retrofit performance evaluation. The results showed that the professionals were generally positive about the importance of the 19 KPIs (with mean ratings being 3 (moderate) or above on a five-point Likert scale). The survey data were analysed through conducting U and H tests, as well as Spearman's rank correlation; the analysed results revealed the variations in the importance of the KPIs perceived by the different groups of professionals.
To enable an effective performance evaluation of building retrofits, the 19 KPIs were further shortlisted. Aside from rating the importance levels of the KPIs, the survey respondents were also invited to provide any other comments about KPIs for building retrofit performance evaluation. Built upon such comments (qualitative) and the KPIs' importance levels (quantitative), a final list of eight KPIs was determined. These KPIs are able to critically reflect the performance of building retrofits in the environmental (energy) aspect and also three other aspects (economic, health and safety, and users' perspective).
Overall, this study contributes to the identification of pragmatic KPIs for commercial building retrofit performance evaluation and serves as a keystone for further development of an analytic evaluation scheme for the assessment of building retrofit performance. Using these KPIs, case studies can be further conducted to examine their applicability in assessing retrofit performance for commercial buildings. To this end, the assessment weighting of each of the KPIs needs to be determined, for example, by an analytic hierarchy process or an analytical network process [53,54], for which the data required can be solicited through interviews with experts working on building retrofits. Furthermore, the methodology of this study can serve as a reference for similar KPI studies in other research domains.