A Contractor-Centric Construction Performance Model Using Non-Price Measures

: Selecting a better performing contractor at the procurement stage is crucial in achieving a successful outcome for a construction project. The construction industry lacks a systematic and purpose driven method to assess performance of contractors using objective metrics. There are many approaches to measuring construction performance, but most are complicated and have high dependency on data that is difﬁcult to attain. This paper aims to create a model for evaluating construction contractors’ performance based on directly attributable measures that are quantitative and easy to gather. This makes such a model more attractive and easier to use. Initially, a detailed literature review revealed different categories of measures of performance (MoP) and corresponding critical measures of performance (CMoP). Through a series of Delphi-based expert forums, the set of measures were ﬁne-tuned and shortlisted. Fuzzy analytic hierarchy process-based comparisons were then used for developing a contractors’ performance model to quantify their level of performance based on a limited set of organisation-speciﬁc and project-speciﬁc measures. The results indicate a shift from traditional measures and a higher preference towards non-price measures. The performance model can be further developed to systematically rank the prospective contractors at the procurement stage based on seven non-price measures.


Introduction
Contributing around 13 percent of Gross Domestic Product (GDP), construction is one of the largest industry sectors with an annual spending of about USD 10 trillion worldwide [1]. However, it significantly lags in performance compared to other industries with nearly 70% of the projects suffering time and cost overruns and flat lined productivity growth [1,2]. Performance reflects the success of a project and is judged and quantified through performance measurement [3]. A construction project is considered to be successful if it is able to meet the project objectives with minimum variations [4]. To achieve project success, selecting a better performing contractor is pivotal [5,6]. Conversely, in the traditional procurement system, the adversity and uncertainty experienced in any construction project, coupled with an unsuitable choice of a contractor, are known to be causes of underperformance [7].
The contractor selection process attempts to assess a contractor's capability based on price, past performance and performance potential [6]. According to Holt et al. [8], a contractor selection technique would be effective if the candidate contractors' capability in terms of performance requirements can be evaluated. Consequently, predicting the performance of the contractor has become imperative [9]. It requires identification of possible measures. Performance measurement is conducted in various streams where critical success factors (CSF), success criteria (SC), key performance indicators (KPI) and with overrun or underrun [23]. Despite being a price measure, cost was included as a measure of performance for the initial rounds of expert discussions. The majority of the previous researchers considered organisational financial performance-related measures when selecting a contractor. Since the industry is project based, each project has a greater influence on the overall financial performance of the contracting organisation [24]. Conversely, better financial health of a contractor could indicate better performance of their projects. There are many financial measures used for assessing organisational performance. Since the financial reports are generally analysed at contractor selection stage, obtaining such financial measures is quite possible and practical.
According to Silva et al. [25], time refers to the agreed or approved duration for the completion of a project. Time, which is specified prior to commencement of construction, is usually taken as the elapsed period from the commencement of site works to the completion and handover of the constructed asset to the client [15]. On-time completion is a target set to achieve by constructors. For individual project success as well as overall organisational success, managing project schedules is a key metric, as it will ensure the effective utilisation of the completed facility by the client as planned [26]. Ineffective time management leads to delays, loss of revenue and loss of productivity [26]. Based on these reasons, 'Time performance' was identified as a category of MoPs that reflects a project's performance.
Human resources are the lifeline of any organisation, especially if it is related to a labour-intensive industry like construction. Measures related to having a capable, competent and committed team and the adequacy of labour/trained resources were highlighted from previous research (please refer to Table 1). Therefore, categorising 'Human resources strength' is justifiable concerning the heavy reliance of human resources in the construction industry. Another important criterion for contractor evaluation was experience and track record [27]. Past project experience in terms of type and scale, is often considered as a predictor about the contractor's performance. Based on the prominence given, it is reasonable to identify 'Experience and track record' as a separate category of MoPs for this study.
The construction industry has a greater obligation towards environmental performance since it is a heavy consumer of natural resources as well as being a large polluter. A series of environmental performance indicators can help the construction organisations to direct their focus and resource deployment towards better environmental performance [28]. According to The KPI Team [29], the KPI related to environment was viewed in two aspects, product performance vs. construction process performance. Project planning is the process of deciding what and how to do before action is required [30], and is a continuous process that spans throughout the delivery of a project [31]. Measures such as 'planning efficiency' and 'planning effectiveness' are often used to assess the level of planning performance. It is evident that project planning is a continuous process that ultimately affects time, cost and other performance outputs.
Productivity, being a fundamental value adding function, is defined as the ratio between output of a production process to its corresponding input [32][33][34]. It measures how well resources are leveraged to achieve the desired outcomes [35]. Often, it is measured and indicated as labour productivity [36]. The importance of good labour productivity was highlighted by Doloi [37], stating that an unproductive workforce will be detrimental to the time management, workmanship, use of materials, safety, and profitability of a project. Therefore, it is an important measure of performance.

Methods
This research used a mixed-method approach for creating a performance model to evaluate contractors' performance based on project and organisation records. Initially a traditional literature review was carried out to identify categories of measures of performance (MoP) and corresponding critical measures that can be used to represent each category.
The key steps in developing the basic performance model were carried out in four expert forums based on the Modified Delphi Method (MDM) and Fuzzy Analytic Hierarchical Process (FAHP), as illustrated in Figure 1 and explained subsequently.
This research used a mixed-method approach for creating a performance model to evaluate contractors' performance based on project and organisation records. Initially a traditional literature review was carried out to identify categories of measures of performance (MoP) and corresponding critical measures that can be used to represent each category. The key steps in developing the basic performance model were carried out in four expert forums based on the Modified Delphi Method (MDM) and Fuzzy Analytic Hierarchical Process (FAHP), as illustrated in Figure 1 and explained subsequently.

Modified Delphi Method
Delphi method is a highly structured technique used to extract the maximum amount of unbiased information from a panel of experts and to achieve a consensus [80,81]. Ameyaw, Hu, Shan, Chan and Le [82] highlighted that Delphi methods are increasingly being applied in construction, engineering and management research during the last three decades. Although Delphi studies are traditionally considered as qualitative, past two decades have experienced the emergence of more quantitative versions with carefully designed research and statistical data analysis approaches [82]. According to Biggs et al. [83], such combination of qualitative and quantitative methods with a panel of experts is often referred to as a 'Modified Delphi Method' (MDM). Delphi process involves several steps. After the appointment of a panel where members have expertise in the relevant topic, an initial round of data collection is performed and analysed. The results are then circulated in the second round of data collection, where the panellists can compare their answers against others and then revise or affirm in the subsequent round [81]. The process is repeated until consensus is reached.

Selection of the Experts
The success of a Delphi process principally relies on the choice of the panel of experts. Generally, non-probability purposive sampling can be used for selecting the experts based on their knowledge, experience and expertise [81]. Accordingly, the researcher has to rely on his or her judgment to perform the selection that enables answering of the research question the best. Saunders [81] further stated that the issue of sample size is vague in non-probability sampling and have flexible rules. Since generalisation is made to theory, unlike to a population, the logical relationship between sample selection and the focus of the research is more important [81,84]. The 'Guidelines for the Rigorous Implementation of the Delphi Research Method', presented by Hallowell and Gambatese [85], were applied for selecting the experts. The number of experts to be included in a panel is another important discussion regarding Delphi-based studies. Ameyaw, Hu, Shan, Chan and Le [82] stated that an optimal size of the Delphi cannot be concluded as the literature extends to a wide range regarding the numbers. However, more than 70% of the papers which

Expert Forums Based on Modified Delphi Method
Delphi method is a highly structured technique used to extract the maximum amount of unbiased information from a panel of experts and to achieve a consensus [80,81]. Ameyaw, Hu, Shan, Chan and Le [82] highlighted that Delphi methods are increasingly being applied in construction, engineering and management research during the last three decades. Although Delphi studies are traditionally considered as qualitative, past two decades have experienced the emergence of more quantitative versions with carefully designed research and statistical data analysis approaches [82]. According to Biggs et al. [83], such combination of qualitative and quantitative methods with a panel of experts is often referred to as a 'Modified Delphi Method' (MDM). Delphi process involves several steps. After the appointment of a panel where members have expertise in the relevant topic, an initial round of data collection is performed and analysed. The results are then circulated in the second round of data collection, where the panellists can compare their answers against others and then revise or affirm in the subsequent round [81]. The process is repeated until consensus is reached.

Selection of the Experts
The success of a Delphi process principally relies on the choice of the panel of experts. Generally, non-probability purposive sampling can be used for selecting the experts based on their knowledge, experience and expertise [81]. Accordingly, the researcher has to rely on his or her judgment to perform the selection that enables answering of the research question the best. Saunders [81] further stated that the issue of sample size is vague in non-probability sampling and have flexible rules. Since generalisation is made to theory, unlike to a population, the logical relationship between sample selection and the focus of the research is more important [81,84]. The 'Guidelines for the Rigorous Implementation of the Delphi Research Method', presented by Hallowell and Gambatese [85], were applied for selecting the experts. The number of experts to be included in a panel is another important discussion regarding Delphi-based studies. Ameyaw, Hu, Shan, Chan and Le [82] stated that an optimal size of the Delphi cannot be concluded as the literature extends to a wide range regarding the numbers. However, more than 70% of the papers which mentioned the Delphi panel sizes had between three to 20 panellists. Eight potential experts were identified based on the set of selection criteria and five of them agreed to participate and their profiles are presented in Table 2.

Data Collection Techniques
Interviews combined with questionnaires were used as the data collection techniques for the expert forums. Since the requirement was to obtain feedback from small group of respondents (experts), it was important that the particular persons could be reached and achieve very high response rates. As the Delphi method require several rounds, the time taken to complete collection of questionnaires had to be minimised where possible. Additionally, it was required to guide the respondents at least at the initial round, in order to clarify any ambiguities and to extract more details about their ratings. Considering all these factors, a combination of both face-to-face and online-based questionnaires were used for the expert forums.

Expert Forum Round 1
The online survey tool 'Qualtrics' was used to design the questionnaire, which was shared with the five experts, via email. The experts were instructed to select one CMoP from each category, that is best representative of the contractor's performance, which can be easily obtained from completed project records. For each MoP, the experts were given the opportunity to introduce any other CMoP as an alternative. The results were extracted and summarised for use in round 2.

Expert Forum Round 2
Individual online interviews were conducted as follow-up to the online questionnaire survey in round 1. Each expert was informed about the spread of the answers received for the choice of CMoP for each category and requested to justify their own choices. The experts were also given the opportunity to change their answers if required. The interviews were recorded, transcribed and analysed. MoPs (and CMoPs) were assessed based on accessibility of data, ability to compute and fairness in reflecting contractor's performance. These details were presented to the experts through a second online questionnaire survey and were requested to indicate their level of agreement using a five-point Likert scale where 1 = Very Low, 2 = Low, 3 = Moderate, 4 = High and 5 = Very High. The results were extracted and summarised for analysis.

Expert Forum Based on Fuzzy Analytic Hierarchy Process
Developed by Saaty [86], Analytic Hierarchy Process (AHP) uses pairwise comparisons to analyse and organise quantifiable and non-quantifiable factors in a scaled systematic manner [87,88]. With respect to a given attribute, pairwise comparisons are made using a scale of absolute judgments which represent the extent to which one element dominates another [86]. However, according to Saaty [86], AHP is more suitable for crisp-decision applications rather than to situations which require both quantitative and qualitative attributes. By adding fuzzy logic into AHP, van Laarhoven and Pedrycz [89] extended it as Fuzzy AHP (FAHP) with the use of fuzzy triangular membership functions. FAHP eliminates the reliability issues in traditional AHP where uncertainty was not dealt properly [90]. According to Ozdagoglu and Ozdagoglu [91], in traditional AHP, the numerical values are exact crisp numbers, whereas in the FAHP method they are intervals between two numbers with most likely value. They further stated that, while linguistic values can change from person to person, taking the fuzziness into account will provide less risky decisions.

Fuzzification
The first step in fuzzifying crisp numbers is to assign a fuzzy membership function. Accordingly, fuzzy set theory provides the mechanism for an element to partially belong to a set, through the use of membership functions [92]. There are several fuzzy membership functions used, such as triangular, trapezoidal, interval, etc., out of which the triangular membership functions are the most popular due to the applicability with linguistic terms [93]. Therefore, a triangular membership function was used in the current research, having lower (l), middle (m) and upper (u) value where a triangular fuzzy number Ã is denoted as (l, m, u). The reciprocal Ã −1 is denoted by (1/u, 1/m, 1/l). The corresponding fuzzy scale used in the research is provided in Table 3. Very strong to extreme importance 8 (7,8,9) (1/9, 1/8, 1/7) Extreme importance 9 (9,9,9) (1/9, 1/9, 1/9)

Pairwise Comparisons
AHP is conducted by comparing one criterion with another as pair, until all such comparisons are completed. This is typically done with a template that is easier to understand by the participants of the exercise. Irrespective of the chosen template for such comparisons, the final output of the pairwise comparisons have to be transferred to a pairwise comparison matrix. After exploring several options, the Microsoft Excel template version of the online AHP tool created by Goepel [94] was selected as it was free to use and accommodated comparing 7 or more criteria. The template was slightly modified to enable fuzzy characteristics required for FAHP. Accordingly, instead of the crisp numeric scales originally present, fuzzy linguistic scales were displayed. Through an online video call, the template screen was shared with each expert individually and asked to perform the pairwise comparisons.

Aggregation and Defuzzification
Aggregation is the process of combining decisions of multiple decision makers and it varies depending on the decision context [95]. For a homogeneous group structure where decision makers' individual judgments are treated as group judgments, geometric mean method of aggregation has been considered the preferred option [95][96][97][98]. Since the experts were selected using similar criteria, the group decision aggregation was therefore done using geometric mean method. Combined matrix for all 5 individual matrices was prepared by calculating the geometric mean of each lower value, middle value and upper value of the fuzzy numbers. The explanation given by Liu et al. [99] was used in this research. The next step was to derive the fuzzy weights from the aggregated pairwise comparison matrix. This step was also performed using geometric mean method [99].
Defuzzification converts aggregated fuzzy results into crisp values and can be performed in several generic method types such as methods related to mean, methods associated with the minimum, methods associated with the maximum and others [99,100]. Among the methods related to mean, the centroid method (centre of area) has been suggested as the better choice of defuzzification in terms of simplicity and wider usage [99]. Therefore, the centroid method was used in this research for defuzzifying.

Checking for Consistency of Individual Comparisons
Consistency is a crucial property of AHP that needs to be checked and maintained in order to ensure that the pairwise comparisons result in a consistent judgment with limited contradictions [99]. Basaran [101] asserted that the most accepted method to calculate Consistency Ratio (CR) for fuzzy pairwise comparison matrices is to convert the fuzzy numbers into crisp numbers and then proceed as ordinary CR calculations of AHP. Taking this approach, the consistency of the pairwise comparisons was checked in real-time using the in-built CR check functionality of the AHP tool used.

Expert Forum Round 4
The experts were given a pairwise comparison chart to compare the measures of performance in order to calculate the weights. Based on the top seven MoP identified by the end of expert forum round 3, all pairwise comparison combinations were set out in the questionnaire. In one-to-one online sessions, the experts were asked to compare one MoP against the other and mark 'X' in the appropriate box to indicate whether it is 'Equally important', 'Equal to moderately important', 'Moderately important', and so on as per the nine levels indicated in Table 3. The resulting matrices were prepared, consistency was checked (for individual matrices), aggregated, consistency was checked (for aggregated matrix), defuzzified and weightings were calculated. Table 4 summarises the results of the first round of expert forums. Accordingly, 'labour productivity' was the only unanimously chosen critical measure among all CMoPs. 'Number of similar type and size projects completed' was the next most agreed-upon by four experts. 'Worker turnover rate', 'debt ratio' and 'number of non-conformance reports' were able to achieve simple majority through three out of five experts in agreement. In contrast, the choice of CMoPs in other five categories of MoPs were split between two or more proposed CMoPs. Defects at the point of handover or end of liability period could be measured.

Number of non-conformance reports
• Non-conformances can be issued to the builder for incorrect constructions. • For tier 1 builders, internal audits would report on the non-conformances while lower level builders would obtain external auditors' help to get those records. • Non-conformances are captured in a register and reported in site meetings. • Document managing software used by builders can track non-conformances.

•
The number of non-conformance reports could range from 10's to 100's or more.

•
The willingness and ethics of the contractor would dictate how data on non-conformances will be disclosed to the clients. • It may mislead based on the type of work (e.g., high quantity of minor non-conformances vs. low quantity of major non-conformances may skew the data).

Time taken to rectify all defects
• It can be misleading (e.g., rectifying a large quantity of defects quickly vs. taking longer to rectify smaller defects). • Some of the work could be hard to classify as defects or incomplete work. • Average time to rectify defects could be a measure that can be compared.

Other measure •
Cost to rectify defective work is a suitable measure as any defect identified would be rectified at a cost.

Results of Expert Forum Round 2
The follow-up discussions with the experts, while presenting the results of round 1, enabled further clarifications and refinement of the CMoPs. Table 4 summarises the finalised choices for CMoPs by the respective experts. This discussion round with the experts resulted in a shift of their choices in majority of the categories of MoPs. Seven of the ten categories achieved majority consensus on the choice of respective CMoPs. Out of these seven categories, five achieved 80% or more agreement towards the choice of respective CMoPs. Therefore, it is evident that a reasonable consensus has been achieved at the end of expert forum round 2. However, a consensus for cost performance, time performance and planning performance was not achieved.

Results of Expert Forum Round 3
Based on the findings of expert forum rounds 1 and 2, and domain knowledge, the list of CMoPs were further refined. To provide more context on the capabilities/issues of each CMoP, they were assessed based on three criteria: (1) accessibility of data, (2) ability to compute and measure, and (3) fairness in reflecting contractor's performance. Categories of MoPs that have CMoPs that do not fulfil all three assessment criteria were marked as dropouts from the final list of MoPs.
Although data are available to a certain extent, cost is a factor that is not within full control of the contractor. Since construction projects are usually subjected to design changes and other variations, there could be implications to the cost due to reasons beyond the contractor's control. Hence it would be unfair to consider cost as a measure of performance when assessing contractor's performance. Furthermore, data related to cost are often heavily contentious and may not be finalised long after the completion of the projects as well. Due to all these reasons, 'Cost performance' was deemed as not suitable as a measure to proceed with developing the performance index. 'Time performance' also has similar issues regarding the ability to compute and not being fair in assessing the contractor's performance. Although being directly related to the contractor, 'Planning performance' measures are not readily available from project records. Furthermore, it is hard to compute such measures from the available records. Therefore, it was reasonable to drop the measure moving forward.
When presented with the results of the previous rounds along with the justifications for shortlisting to the top seven CMoPs, the experts replied expressing their levels of agreement along with comments, regarding the shortlisting process. Summary of the feedback from experts is presented in Table 5. Based on the comments received, it can be stated that a high level of consensus has been achieved at the end of expert forum round 3 with regard to the choice of the top seven categories of MoPs and the corresponding CMoPs.  High High 1 -Hard to interpret cost data to find actual/initial/final costs of a project and it is highly contentious due to variations and claims; 2 -Depends on the procurement route, consultant's errors in estimation, scope changes etc; 3 -Time is a highly contentious matter due to variations and claims; 4 -Hard to access details of construction programme; 5 -Not practical to identify details related to construction tasks due to scope changes, variations etc; 6 -Can consider another measure: tender baseline vs. actual baseline cost; 7 -Past experience may impede new comers to the industry. However, it is reasonable to consider from a client's perspective; 8 -Often the clients do not consider environmental performance as a key requirement. It usually gets superseded by factors of cost; 9 -Total man hours onsite will not work for modular construction projects. Also depends on the project value.

Results of Expert Forum Round 4
Using the seven shortlisted MoPs, a pairwise comparison chart was prepared and shared with the experts via online video calls. Each expert was asked to compare the MoPs in pairs based on the relevant linguistic expressions (as listed in Table 3). The corresponding CMoPs were also displayed alongside. Figure 2 shows an extract from the pairwise comparisons template used. 1-Hard to interpret cost data to find actual/initial/final costs of a project and it is highly contentious due to variations and claims; 2-Depends on the procurement route, consultant's errors in estimation, scope changes etc; 3-Time is a highly contentious matter due to variations and claims; 4-Hard to access details of construction programme; 5-Not practical to identify details related to construction tasks due to scope changes, variations etc; 6-Can consider another measure: tender baseline vs. actual baseline cost; 7-Past experience may impede new comers to the industry. However, it is reasonable to consider from a client's perspective; 8-Often the clients do not consider environmental performance as a key requirement. It usually gets superseded by factors of cost; 9-Total man hours onsite will not work for modular construction projects. Also depends on the project value.

Results of Expert Forum Round 4
Using the seven shortlisted MoPs, a pairwise comparison chart was prepared and shared with the experts via online video calls. Each expert was asked to compare the MoPs in pairs based on the relevant linguistic expressions (as listed in Table 3). The corresponding CMoPs were also displayed alongside. Figure 2 shows an extract from the pairwise comparisons template used.

Pairwise Comparison Matrices
Pairwise comparison matrices were generated based on the pairwise comparisons performed by the experts. A sample pairwise comparison matrix (by expert E2) is presented in Table 6. Table 6. Pairwise comparison matrix of expert E2.

Pairwise Comparison Matrices
Pairwise comparison matrices were generated based on the pairwise comparisons performed by the experts. A sample pairwise comparison matrix (by expert E2) is presented in Table 6.

Aggregated Pairwise Comparison Matrix
Using the method explained by Liu, Eckert and Earl [99], geometric means from all five experts' pairwise comparisons were calculated, as explained below: Let DM1, DM2, DM q be the q number of decision makers (experts); Let C1, C2, . . . . Cn be the n number of criteria used to compare; Let C ij ) be a triangular fuzzy number representing the relative importance of C i over C j judged by DM t and; Let w i be the fuzzy weight of C i.
According to geometric mean method, The resulting aggregated pairwise comparison matrix is presented in Table 7. To derive fuzzy weights from the aggregated pairwise comparison matrix, geometric mean method was again used as explained by Liu, Eckert and Earl [99]:

Defuzzifying the Weights
Based on the given equations, 'Fuzzy geometric mean values' were calculated across each row to obtain the values for the MoPs followed by the 'Fuzzy weights', as shown in Table 7. As the final step, 'Defuzzified crisp numeric weights' were calculated using the centroid method presented in Equation (4) [99] (refer to Section 3.2.1 for definitions).
Consistency Ratio → CR = Consistency Index CI/Random Index RI (6) where λ max = largest eigenvalue of the matrix n = number of criteria Random Index RI = 1.32 for a matrix of 7 criteria [102] readily accessible, easily computable and fair to reflect contractor's performance have been identified along with the respective levels of importance (weights). Further discussion is provided in Section 5.

Discussion
By the end of the expert forum rounds, it was evident that some of the most commonly quoted measures tended to have limitations based on the comments received from the experts. One of the most significant outcomes was dropping of time and cost performance from the top measures of performance. From what was traditionally referred to as the 'iron triangle', only 'quality' remained after the comprehensive series of expert forum rounds. The experts did not have agreement that time and cost performance should be included as measures. Cost performance, as a measure, failed the test of being easily accessible data for the fact that it cannot be defined clearly. To define cost performance, it needs to be identified through data such as the original cost, the completed cost, the reasons for any difference and whether the difference was attributable to contractor performance. These are not easy to prove, and it is fairly difficult to say who is responsible. The differences often may result in variations. A portion of the variations could be pure variations coming from the client or the design team while some maybe due to contractor attributable factors. Sifting that and finding how much is affected solely by contractor performance is a difficult measure. As such, the tests of ease of access and measurability failed, which was agreed by the expert forum. In this regard, time performance has issues similar to cost performance. For example, compared to the original schedule, the final schedule could be affected by many factors which are beyond the control of the contractor (e.g., weather), as well as factors attributable to their own failures. Complexity of such differentiation fails the test of the data being easy to access. The shortfalls of time and cost performance measures similarly affect project planning performance, hence leading to its removal as a suitable category of measure of performance based on experts' agreement.
When subjected to FAHP-based pairwise comparisons, the remaining seven categories of MoPs resulted in weights indicating their level of importance. With an aggregate weight of over 50%, priority was given to health and safety and quality of construction. Health and safety performance, achieving a weight close to one-third of the performance index, indicates the importance of making the industry safer and focuses on performance of the 'process'. While its limitations were highlighted, majority of the experts chose lost time injury frequency rate due to the higher availability of data across the industry. Although reported incidents rate has more coverage than LTIFR and technically easier to compare across, it was less preferred mainly due to the limitations of obtaining data compared to LTIFR. This is one example which substantiate the need to revamp some of the commonly used performance metrics that are still in use throughout the construction industry. Focusing in the performance of the 'product', quality gained a weight of one fifth of the overall performance. Based on experts' choices, it was evident that any measure related to construction defects or rework would not be suitable in gauging quality performance. On the other hand, non-conformance reports were identified to be a good alternative if the records are properly kept and maintained.
When it comes to overall organisation's financial performance, the experts' clear preference was to calculate the debt ratio. It is common that some of the contracting organisations (including developers with their in-house construction arms) operate under several business entities. This increases the risk for construction clients, as some of the businesses would declare bankruptcy and re-emerge as a different entity. Therefore, debt ratio would be an ideal indicator for financial performance. With an importance level close to that of financial performance, past experience is another important category of performance when assessing a contracting organisation. Counting the number of similar type and size projects completed is a traditional measure for assessing experience. The expert forum results affirmed that it continues to be a valid measure from the client's perspective. However, it can be disadvantageous for newcomers to the industry.
Despite being one of the least cited categories of measures of performance from literature, environmental performance ended up having close to one tenth of weight of the performance index. The chosen critical measure, total waste removed from a site, achieved the largest jump in terms of experts' preference by the end of round 2. It is also an indication of the push towards more sustainable construction practices leading to waste minimisation and recycling. Human resources strength was considered an important MoP (with a weight of 8.3%) where worker turnover rate was unanimously chosen by the experts as an indicator that can be tracked and compared easily. It was further revealed that the key staff turnover would severely affect the performance of a project. Productivity achievement, with the least weight of 6.3%, was proposed to be measured using labour productivity which can be a good comparator for predominantly on-site vs. predominantly off-site-based construction projects. Labour productivity being chosen unanimously is an indicator that simple and straightforward measures are preferred.
Since the CMoPs are measured in different units, they need to be converted to a unified scale. Furthermore, CMoPs related to health and safety, quality, financial performance, human resources strength and environmental performance will indicate better performance if the numbers are lower. Similarly, CMoPs for experience and track record and productivity achievement will indicate better performance when the respective figures are higher. Therefore, the CMoPs have to be made unidirectional. Ultimately, it can be represented as a linear additive model where an index score can be computed.
Even though price related measures were originally included for the discussions, they naturally did not make it to the final list of critical measures. This affirms the need for more non-price measures when assessing performance.

Conclusions
Construction industry is suffering from poor performance, and it is largely attributable to the contractors. Therefore, assessing the performance of a contractor becomes crucial for their own improvement as well as for the purpose of selecting the better performing one when procuring for construction projects.
Comparing and calculating weights for different performance criterions are not new to the construction industry and over the years, this has been conducted using various methods. Often, these performance criterions are compared at high level and the actual way of measuring them would be decided later. In contrast, this research approached the problem with both the categories of measures of performance and the respective critical measures identified through a comprehensive literature review. Followed by a series of systematically driven Delphi-based expert forum rounds, the measures of performance and their corresponding critical measures were shortlisted. These measures were subjected to fuzzy analytic hierarchy process-based pairwise comparisons which resulted in a basic performance index with weights for seven categories of measures of performance; health and safety (30.9%), quality (19.2%), experience (13.3%), financial (12.9%), environmental (9.1%), human resources (8.3%) and productivity (6.3%).
The main contribution of this research was the identification of key areas of performance (along with the respective weights) that can be gauged using non-price metrics which are objective, tangible and readily available when evaluating contractors' performance. Further research will be carried out to convert the identified CMoPs and the corresponding weights into a performance index with a unified and unidirectional scale. The resulting performance model can be used to quantify individual project performance and be aggregated as a score for ranking contractors. The simplicity of the identified critical measures of performance makes the model more usable without the need for complex analytics.
Since the metrics relate to data that is generally recorded on a day-to-day basis due to administrative and regulatory requirements, high availability of data is anticipated. The simplicity and availability of required data increase the possibility of using archived information to gauge performance of past projects in retrospect as well. The developed performance index allows the contractors to self-evaluate their level of performance. On the other hand, clients and consultants are able to review contractors' performance easily based on readily available data. Ultimately, the outcome of this research can lead to a rating mechanism which encourages a culture of measured improvement of performance of contractors.