Reliability-Based Decision Support Framework for Major Changes to Social Infrastructure PPP Contracts

: In the operational phase of public-private partnership (PPP) contracts, undue delay in addressing real needs may lead to poor service outcomes; conversely, commencing variations to a PPP agreement on the whim of end-user runs the risk of reducing the value created by detailed structuring and considerations undertaken in establishing agreement. This di ﬃ culty is exasperated as there is generally a lack of understanding by the end-user as to the speciﬁcs of service delivery performance requirements contracted. In order to address this question, this study, for the ﬁrst time, develops a reliability-based decision support framework (RDSF) that incorporates end-user’s perceived service quality (i.e., how satisﬁed it is with the space, operation and maintenance activities) with those speciﬁed in the PPP agreement, and further identiﬁes when the gap between end-user’s expectations and contractual obligations warrants reconsideration. This developed framework is then implemented to test the data gathered from three PPP schools in Australia based on both a current snapshot of performance data, i.e., abatements as gathered through contract documents and end-user’s perception through in-depth interviews, and a projected scenario of the future as well. Reliability analysis used here compares time-dependent risk proﬁles of current and expected performance and thereby identiﬁes major changes in a PPP contract that would sensibly require reconsideration. The speciﬁc results indicate there is no current di ﬃ culty between end-user’s perception and the contract. However, the projected long-term scenario demonstrates how the decision framework can identify areas for review and changes if end-users are more dissatisﬁed with the service being achieved. The RDSF is capable of quantifying current service performance, considering the engagement of the end-user. Thus it enriches theories in the ﬁeld of performance management system (PMS), and also contributes to knowledge regarding an evidence-based test for justifying possible agreement modiﬁcations or additional works in social PPPs operations. In addition, guidance for performance improvement strategies in aspects of the dissatisﬁed area is also provided. Application of this approach would assist in maintaining the long-term value for money of social infrastructure PPP agreements. validation, L.G.; formal analysis, investigation, resources, preparation, editing, visualization, supervision, C.D., L.Z., F.K.P.H.;


Introduction
Public infrastructure and services procured via public-private partnerships (PPPs) has generally been demonstrated as a mechanism for delivering quality, innovation, value for money outcomes and with the advantage of accessing financial markets to even out the cashflow requirements for a government's investments [1][2][3]. The long-term service obligations (20 to 30 years) of PPPs [4] encourages whole of life thinking and a high level of professionalism for facilities management and maintenance activities. Demand for new infrastructure and maintenance continues to exceed the maintenance activities. Demand for new infrastructure and maintenance continues to exceed the available capital from a government's traditional budgetary processes due to the need to the many requests to upgrade aging facilities and to meet the increased demand due to population growth, and these pressures contribute to the growing emergence of PPPs [5]. For instance, 175 infrastructure projects were initiated as PPPs in 2019 at a total value of US$49.8 billion [6]. This was an increase of 14% over the 2018 numbers. Despite the maturity and sophistication in implementing PPPs in Australia [7], some projects still face issues during operations, e.g., patron issues with Sydney's Cross City Tunnel when traffic was funneled onto the toll road to enhance traffic volume and the troubled Northern Beaches Hospital where there was inadequate planning for operational activities [8]. On the other hand, many social infrastructure PPP projects did meet all contractual obligations and satisfy the expectations of service providers [9]. The absence of effective ex-post service performance measurement in addressing the requests from end-users is one potential reason for these differences in observed service outcomes [10].
Substantive research has measured, evaluated and explored the PPPs performance in the view of cost, time, quality, safety and technology innovation [11][12][13]. The perspective of end-users and their voice for further service performance improvements in PPPs operations has largely been unrecognized [5], although many studies have proved that PPP success goes through the end-user's satisfaction regarding their experience quality [10,14,15]. In essence, the PPPs provide physical infrastructure to meet the end-user's needs more efficiently and comprehensively. In turn, the feedback from the end-user plays an important role in building and rebuilding the infrastructure. This paper, thus, focuses on the end-user's perceived service quality and contractual obligations through the lens of marking theory, to quantify service performance and identity when their gap warrants reconsideration, consequently addressing necessary major changes of project scope in social PPP operations. The service delivery process for PPPs is described in Figure 1, where marketing theory is adapted as the public infrastructure or service could typically be viewed as a production process [16]. Service delivery process diagram within PPP practice. (Gap 1: difference between the service quality perceived by end-user and the assessment of delivered service; Gap 2: difference between end-user's service expectation and output specifications in a contract; and Gap 3: difference between the assessment of delivered service and contractual obligations in terms of output specification).
In the service delivery process, the end-user is considered as rational consumer with individual and diverse needs, generating their own service expectations. Their perception of service quality is generally based on their preconceived view and local experiences rather than any understanding of the contractual obligations of the PPP provider. Thus, if the end-user' perceived service meets or exceeds their expectation, they will be satisfied. If their experience does not meet their expected level of service, they will be dissatisfied, regardless of any contractual obligation detailed in the output specifications, (refer 'Gap 2' in Figure 1). In a perfect world, when the agreement was struck there would be no gap between end-user expectations and the contracted services, (i.e., Gap 2 = 0). The contractual obligations of PPP Co., as detailed in the output specifications of the agreement, are monitored and administered by the government's contract manager through a key performance indicators (KPIs) mechanism, which is referred to as a performance management system (PMS) [15]. Should the delivered service fall below that specified in the contract (refer Gap 3 in Figure 1), the government has an abatement regime that is linked to KPIs. The application of the abatement regime results in an adjustment of the periodic payment to the PPP Co, usually done on a quarterly basis. Thus, should the end-user have concerns about the quality of delivered contracted services, the mechanisms in the contract, once applied, should close the satisfaction gap. Over the term of the contract changes invariably occur. These changes may be linked to changes in demand or changes in service expectations due to unforeseen events or changing community expectations with time. It is also possible that the expectation of the end-user is more a wishlist of desired outcomes rather than defined policy objectives or government standards of the day. It is often difficult to understand when a gap in end-user expectation relates to something that should be redressed through major change to the PPP contract, the undertaking of additional minor works, a major modification to the agreement or if this end-user gap relates to an ambient request for an enhancement or simply a lack of understanding of terms of the agreement. Delays and frustration often result when requests for enhancements go unanswered, yet current practice lacks mechanisms to transparently consider the end-user's perception of service quality. Coincidentally, Majamaa, et al. [17] argued that there is a need to explore the end-user's perception and foresee the diverse service production from the perspective of the end-user, particularly for social infrastructure PPPs.
To summarize, the research question in this study originates from practice, i.e., the social PPP infrastructure, such as hospitals and schools, are normally confronted with major modification works due to the end-user's need and requirements, which could lead to major changes in the contracts. Although there is a KPIs mechanism as the benchmarking tool by which governments measure and monitor the performance or quality of service delivered by private sector against the agreed standards, the satisfaction of the end-user, which is playing a significant role in PPP operations, after a review of normative literature, is always overlooked in the current research mainstream. Therefore, the purpose of this study is to shed light on how the end-user can be better engaged to improve PPPs performance, mainly directed at social PPPs operations. An innovative reliability-based decision support framework (RDSF) in terms of dealing with potential major changes during operations based on an ex-post performance evaluation through the lens of the end-user is developed, which utilizes the probability of uncertainty of end-user sentiments and contractual measurements at particular points in time. It compares data gathered on end-user perceived service quality and the assessment of delivered contractual services as measured by abatement adjustments to period payments. The RDSF proposed in this study is a managerial decision-making tool that can be used in a performance management system (PMS) for public managers. On one hand, it enriches theories in the field of social PPPs performance evaluation and contribute to new knowledge in terms of dealing with the potential major changes during PPPs operations by extending the previous use of reliability analysis [18] in the appraisal of a desalination plant [19], residential buildings [20] and port infrastructure [21], into considering the uncertainties in social PPPs performance. On the other hand, it provides government with a practical tool of social PPP operations that can transparently incorporate the end-user's views into decisions as to whether enhanced services are warranted or not. Application of this framework would assist in maintaining the long-term value for money of social infrastructure PPP agreements.
The RDSF is tested using qualitative survey data from the end-user and quantitative cost data on three PPP schools in Australia. In addition, the practical application of the framework over the long term is considered using a synthetic forward projection example. The following sections detail the: establishment of the conceptual framework, the research methodology, testing of the model using three PPP schools and a discussion of the broader applicability of the model.

Conceptual Reliability-Based Decision Support Framework
Social PPPs refer to public infrastructure or services procured via PPP concession agreements. More broadly, social infrastructure is defined as "community facilities, properties, services that help individuals, families, groups and communities meet their social needs and maximize their potential for development and enhance community wellbeing" [22]. Unlike economic PPPs, the social PPPs, such as schools, hospitals and prisons, are procured using more performance-based service specifications [23], while requiring less capital expenditure. They tend to be complex and have dynamic service delivery requirements throughout their life cycle, particularly in terms of ongoing involvement with the community [23,24]. Social PPPs are generally in the form of availability-based payments whereby government retains demand risk and pays for the service delivered based on the facility being available to the specified standard [25]. Research has started to explore approaches to evaluate PPP performance. Liu et al. [26] sought to establish a life cycle and stakeholder-oriented performance framework based on a series of relevant indictors. Specifically, M Wijayasundara, et al. [27] explored the relationship between building space with the performance outcomes of a public school. Similarly targeted at PPP schools, Mohammed et al. [28] proposed an enhanced framework for better assessing operational performance in consideration of international PPP audits and practices. Yuan et al. [29] developed a building information modeling-based performance management system for PPPs. Nevertheless, there remains a paucity of research that give guidance on how to comprehensively measure the performance of social PPPs even though it is pivotal for achieving value for money for government throughout their life cycle [30] and the specific value added during the operation and maintenance phase [14].
The proposed RDSF has three core concepts being: the assessment of service delivered by KPIs, the measurement of end-user perceived service quality and reliability analysis in consideration of uncertainty project context. The representation of this framework is shown in Figure 2. This model explicitly considers an assessment of the delivered performance against the contracted services (detailed in agreement output specifications) and the perceived standard of services delivered through the eyes of the end-user. The model considers aspects of key services, such as general service specification, building maintenance and utility/waste/security/cleaning/ . . . management services for government and space, operation and maintenance for the end-user. uncertainty project context. The representation of this framework is shown in Figure 2. This model explicitly considers an assessment of the delivered performance against the contracted services (detailed in agreement output specifications) and the perceived standard of services delivered through the eyes of the end-user. The model considers aspects of key services, such as general service specification, building maintenance and utility/waste/security/cleaning/… management services for government and space, operation and maintenance for the end-user.

Assessment of Contracted Service Based on Key Performance Indicators
Use of KPIs to measure service performance is common practice in PPP contracts. Similar to all construction projects, the procuring of social PPPs can be seen as a production process [31,32]. When the private sector signs the contract, they not only provide the construction product, but also the service outlined in the output specifications to meet the government's requirements. Maloney [16] illustrated the construction process as a service product, service delivery and service environment. A service product is the service as specified to be delivered in the contract, which usually includes general services, building, maintenance, management service and utility, waste, security service and sometimes furniture, fixture, and equipment; refer to Figure 2. A service environment means the internal environment such as the culture of Project Co (private sector), and the external environment such as the availability of plant, materials, and equipment. If Project Co fails to provide the service quality, the government is eligible to abate or reduce a pre-agreed amount of money from the service payment. The quantum of any abatement forms part of the contract and is linked to KPIs, often on an escalating scale for repeated failures. It is considered that the use of abatements is a reasonable indicator for the acceptability of service performance in line with contractual obligations.
It is worth noting, as the amount of abatement cost varies a lot across different PPP projects, it is meaningless to express the service delivery performance just in monetary form. The European PPP Expertise Center [33] states that the amount of performance deductions could be limited for a particular service in any payment period at a level that is more closely aligned to the forecast cost to the private partner of providing the relevant service. For example, limiting the private partner's exposure to performance deductions relating to cleaning services. Hence, an advance assessment of delivered service (AoDS, termed with variable 'A') is presented to normalize the monetary abatement cost in the ratio of abatement cost and the performance-related service payment (e.g., operation, cleaning and maintenance payment), the value of which is calculated by its percentage multiple of the total quarterly service payment, i.e., the sum of the actual quarterly service payment through the PPP payment mechanism and the abatement cost.

Service Quality Perceived by End-User
One of the benefits from social PPPs is enhancing people's well-being by the early provision of new facilities [34]. To better achieve potential social PPPs benefits, improved communications among stakeholders is essential [35,36]. The mismatch between the contracts being managed remotely from the end-user makes integrated communication more difficult, particularly when the contracts focus on technical and economic concerns [37][38][39], risk analysis [40][41][42] and benefits as compared to traditional procurement [43,44].
In traditionally procured social infrastructure, long-term operational issues are managed by government, often those intimately involved in operations. In PPPs, the focus on value for money (VFM) natural drives to the capability and innovation of the private sector in managing risk, design, construction and operations (as a third party), maintenance and creating cash flows [2]. Some practitioners in the PPPs are even unclear whose expectation they need to satisfy [17]. Majamaa et al. [17] considered this and demonstrated the deficiency of am end-user's perception during the PPPs evaluation through the use of several cases studies. Mohammed et al. [28] evaluated the impact of innovations on long-term outcomes and verified that the end-user's involvement positively influences PPPs' project performance.
The end-user's perception of service quality (PSQ, termed with variable 'P') is a function of the relationship between expected service, thus the expectation the end-user has for the service to be provided, and the perceived service, thus the end-user's perception of the actual service that has been provided [16,45]. It is viewed in terms of a process of 'expectation disconfirmation', where the satisfaction is based mostly on whether it meets or exceeds expectation. The perceived service quality leads to an understanding for improving service quality because it captures the relative information in influencing the end-user's overall perceptions of service [45]. The end-user's perceived service quality can reflect their expected service to a degree [46]. Specific social infrastructure assets require tailored measures to reflect the end-user's perceived service quality, but three generic categories exist being: the space quality, operation, and maintenance of such a service. These categories give rise to represent the end-user's level of satisfaction P in the way of multicriteria weight analysis, i.e., the sum of weighted satisfaction of each category.

Decision-Making Process
The 'failure' during social PPPs operations in this framework is defined as the situation where the service quality perceived by the end-user is lower than the assessment of delivered service by the government, i.e., P f = (P − A < 0). In light of the size of probability of this relationship, the result of RDSF could be expressed as the following three cases in Equation (1).
≤ l 1 defined as good > l 1 but ≤ l 2 defined as acceptable, > l 2 defined as poor (1) First, if the probability of failure ≤ l 1 , the service delivery performance is defined as good. In this case, it means the probability that the actual service delivery cannot meet end-user's expectation is less than l 1 . As a result, the government may be satisfied by considering there is enough resilience between the end-user's satisfaction and the assessment of actual service delivered. Second, if probability of failure > l 1 but ≤ l 2 , the service delivery performance is defined as acceptable. In this case, it means the probability that the actual service delivery cannot meet the end-user's expectation falls in the zone [l 1 , l 2 ]. The zone between l 1 and l 2 is the zone where the level of actual service delivery cannot meet end-user's expectation is acceptable. The government may be neither very satisfied nor unsatisfied in this situation. Last, if the probability of failure > l 2 , the service delivery performance is defined as poor. In this case, it means the probability that actual service delivery cannot meet the end-user's expectation is bigger than l 2 . The government may be unsatisfied about the social PPPs service performance as it is dangerous for the service delivered to fall below the end-user's expectation.
This reliability-based decision support framework defines a new perspective in terms of probability to measure social PPPs service performance by analyzing the assessment of delivered service and end-user's perception of service quality. In this view point, only when the gap between P and A is larger than (1 − l 2 ) will government be satisfied, and also support resilience management in the project to deal with unforeseen major changes. This conceptual framework can assist governments in paying attention to the end-user's perception and develop more end-user-oriented public service. In addition, the end-user would achieve value for money by engaging the public service that adequately meets their needs and expectations.

Research Methodology
Qualitative and quantitative methods are necessary to inform specific values for P and A adopted in the RDSF. Specifically, the data for A are derived from the contract documents and payment records for abatement costs, and the data on P are collected through in-depth interviews to end-user. Reliability analysis is utilized to measure the above probability of relationships, see the definitions in Equation (1), and of particular interest is the intersection of the two variables, i.e., P and A. It is a risk-based design concept to incorporate uncertainties into the framework that has widely been used in the area of system engineering lifecycle management, dealing with the estimation, prevention, management of lifetime engineering uncertainty and risks of failure [47]. In this context, the uncertainties could be captured by analyzing the input variables relating to different scenarios. Hence, this approach presents government with an early indicator of when genuine change may be warranted and thus better satisfy the end-user's expectation through the whole lifetime of social PPP operations. Taking the failure as defined in Equation (1), P f , i.e., the key indicator for social PPPs service performance improvement, could be illustrated as the following equation adapted from [48].
where F P (a) is the cumulative distribution function of perceived service quality calculated at 'a', f A (a) is the density distribution function of assessment of delivered service. The application of reliability analysis into social PPPs service performance measurement is shown as Figure 3, in which µ A and µ P are the mean value of A and P, σ A and σ P are the standard deviation due to uncertainty, and A n and P n represents the nominal values of the perceived service quality and the assessment of delivered service. Thus, the P f is dependent on these factors.
where ( ) is the cumulative distribution function of perceived service quality calculated at ' ', ( ) is the density distribution function of assessment of delivered service. The application of reliability analysis into social PPPs service performance measurement is shown as Figure 3, in which and are the mean value of and , and are the standard deviation due to uncertainty, and and represents the nominal values of the perceived service quality and the assessment of delivered service. Thus, the is dependent on these factors. The basic principle in reliability analysis is to make the shaded area as small as possible, which represents favorable social PPPs performance.
and , here are assumed as normally distributed variables under uncertainty, thus corresponds to N( , ), corresponds to N( , ), the result variable ( − ) corresponds to N − , + . The probability of failure ( ) can be expressed as the Equation (3) from [48]. The basic principle in reliability analysis is to make the shaded area as small as possible, which represents favorable social PPPs performance. A and P, here are assumed as normally distributed variables under uncertainty, thus A corresponds to N µ A , σ 2 A , P corresponds to N µ P , σ 2 P , Appl. Sci. 2020, 10, 7659 8 of 18 the result variable (P − A) corresponds to N µ P − µ A , σ 2 P + σ 2 A . The probability of failure (P f ) can be expressed as the Equation (3) from [48].
where f X (X 1 , X 2 , . . . , X n ) is the joint probability density function for the random variables of X 1 , X 2 , . . . , X n ; However, the joint probability density function and the data on these random variables are difficult to be available. The first-order second moment reliability methods, thereby, is used to get the approximations of the result using the means and standard deviations. For this study, the probability of failure (P f ) can be expressed as Equation (4), where ∅ is the cumulative distribution function of the standard normal variable; P f is dependent on the ratio between the mean and standard deviation of variable P − A, which is called reliability index, and expressed as, Hence, the probability of failure (P f ), for social PPPs service performance can be illustrated as Equation (6).
Additionally, In the case of multicriteria weight analysis, the mean value of perceived service quality (P) equals the weighted mean value for each category, i.e., space (s), operation (o) and maintenance (m) activities. The standard deviation of perceived service quality (P) equals the square root of the sum square of weighted standard deviation of each category. Thus, the probability of failure (P f ) could be interpreted as, In this study, Equation (2), which is commonly used in structural reliability analysis and prediction [18], has already been used in the appraisal of a desalination plant [19], residential buildings [20] and port infrastructure [21]. The probability of failure is estimated using Equation (2) based on the uncertainties of A and P. The substitution of the mean and standard deviation of P obtained using multicriterial weight analysis into Equation (6) leads to Equation (7) for the probability of failure prediction.
Furthermore, from the perspective of the whole of the life cycle, the time-based performance curve can be showed as Figure 4. A(t) and P(t) would vary with time due to KPIs assessment and limited space, increasing need and gradually outdated technology etc. A(t) looks like a straight line owing to few abatements cost every year and its fluctuation cannot be reflected clearly. A reasonable hypothesis is made that at first, the gap between A(t) and P(t) is considered safe enough that the (P(t) − A(t) < 0) ≤ l 1 , the service delivery performance is good. On the basis of the A(t) not varying significantly with time, the gap narrows over time due to reduction of the end-user's perceived service quality. When P(P(t) − A(t) < 0) falls into the zone [l 1 , l 2 ], the service delivery performance is acceptable. However, when the (P(t) − A(t) < 0) > l 2 , the service delivery performance is considered as poor.
hypothesis is made that at first, the gap between ( ) and ( ) is considered safe enough that the ( ( ) − ( ) < 0) ≤ , the service delivery performance is good. On the basis of the ( ) not varying significantly with time, the gap narrows over time due to reduction of the end-user's perceived service quality. When ( ( ) − ( ) < 0) falls into the zone , , the service delivery performance is acceptable. However, when the ( ( ) − ( ) < 0) > , the service delivery performance is considered as poor.  Reliability analysis is used to measure social PPPs service delivery performance through A(t) in terms of abatement cost ratio and P(t) in terms of end-user's satisfaction. In addition, it aims to consider project uncertainty and explore how the service delivered by the private sector could be improved to meet the end-user's expectation. The application of this approach is considered in the next section through its application to a case study of three Australian PPP schools.

Case Example: Australian Public-Private Partnership (PPP) Schools
The theoretical mathematical methodology involves two variables, i.e., the perceived service quality by the end-user P(t) and assessment of delivered service in contract A(t), where probability of failure is defined as the probability of P(t) being lower than A(t). The mathematical method in this reliability-based decision support framework could evaluate service performance and identify when the gap between P(t) and A(t) warrants reconsideration. Using the case study, the data of P(t) and A(t) of three PPP schools are statistically analyzed for performance assessment.
The scope of services provided in these 25-year-long PPP schools included: the original design and construction of the buildings and associated amenities such as playgrounds and sporting grounds, provision of finance, ongoing facilities management, upgrades, any required and approved changes and a selection of services such as cleaning and maintaining. At the time of the study the schools had been operational for nine years. The actual payments (and any abatements) made by government over this period were collected along with details of minor works, variations and any modifications undertaken during this period. The end-user's perceived service quality is derived based on a series of semi-structured face-to-face interviews with the key end-users (the principal and senior teachers) within the schools, through a common list of questions. The interviews were conducted in accordance with The University of Melbourne ethics approval #1750880. This method provides an interactive dialogue with the respondents to tap into their front-line experience under a specially designed questionnaire.

Data Description
Multiple-project analysis was conducted to simulate the uncertain environment and to test the framework's feasibility. Many cultural and jurisdictional differences were avoided in the sample by selecting three schools that form part of one PPP contract from one jurisdiction in Australia, thus, their difference in data can be viewed as random variables under uncertainty. Abatement costs and the end-user's satisfaction levels are considered as vital variables as, on the one hand, they can show how the project service is delivered by the private sector, and on the other hand, how the service is perceived by the end-user. In this study, abatement data and actual service payments from 2010 to 2018 for school A, B and C are listed in the way of the ratio of abatement cost in Table 1.

Table 1. Cost information for PPP schools (A$k).
Year Notably, the performance-related service payment for these PPP schools was 6% of the contract sum, based on the historical data collected in this project. The service payment is much smaller in t = 1, i.e., the year 2010, due to only one quarter service payment in the first open year. Also, the criterion for good and acceptable is set as 10% (l 2 ) and 20% (l 2 ) respectively in this study, i.e., when the probability of failure P f ,t ≤ 10%, it means the service performance is good so as to meet the principals and teachers' need very well. When the P f ,t > 20%, it indicates the government should consider major changes in the original project scope to satisfy the principal's and teachers' expectation. When the P f ,t falls on [10%, 20%], it is considered as acceptable. These criteria could be referred to from MMG Education [49], which has the lowest satisfaction score from 4545 benchmarked schools of 79%, and thus the acceptable bottom line is 21%. The criterion could be changed to adapt to different cases.

A(t) A A(t) B A(t) C
The perceived service quality by the end-user was inferred on the basis of data collected using in-depth interviews with school principals and senior teachers. The interviewee profile is provided in Table 2. It shows the average working length for all interviewees is 17.3 years, with a minimum working length of 7.5 years and the average working school number is about 7, from which it could be concluded that the interviewees are sufficiently experienced to give the objective opinions on service delivered in PPP schools. The in-depth interview was supported by the questionnaire with a seven-point Likert scale beginning at 1 representing extremely dissatisfied, very dissatisfied, dissatisfied, unsure, satisfied, very satisfied and extremely satisfied in turn. To be consistent to the interview scale, a liner relationship between abatement cost and performance score (assumed no repeated service failure, as it would cost more deductions) is established, where no abatement cost means 95% and above performance [15]. Hereby, the zone of 4.95-5 represents satisfied, where payments are paid in full according to payment regime, 4 represents full abatement cost, which corresponds to the meaning of deduction, and thus might not be fit for purpose, 3 is too terrible that it means no availability. The scale description can be seen in Table 3. The areas of items in the questionnaire followed the categories of space quality, operation, and maintenance. Subdivisions were developed for each of these categories and identification of appropriate interviewees are detailed in the satisfaction analysis, which involved satisfaction of teaching space and non-teaching space, operation activities and maintenance activities in terms of the delivery of teaching service and attracting and retaining students. The interview result for t = 9 is shown in Table 4. The mean and standard deviation value of perceived service quality (P) score is computed, supposed each category has the same weight for simplicity.

Data Analysis in t = 9 and Discussion
Tables 1, 3 and 4 detail the data source for the reliability analysis in t = 9. Based on the established framework, A(t = 9) and P(t = 9) was computed. As a result, in t = 9, µ A,9 = 4.896, σ A,9 = 0.014, correspondingly, µ P,9 = 5.75, σ P,9 = 0.28. The reliability index θ t is an indicator of P f ,t , which means higher θ t represents lower P f ,t , implying preferable service performance social PPP project. For case study schools, the reliability index θ at t = 9 years was 3.058, thus P f ,t = 9 = 1 − ∅(3.058) = 0.1%, representing the probability of failure is 0.1% when a PPP school had been operating for 9 years. Given the l 1 is 10%, this result indicates excellent performance for the PPP school project. In other words, these is enough resilience between end-user's satisfaction and actual servicer-delivered performance, which can protect the government from major unexpected needs or changes during the operations. Figure 5 shows the distribution of A(t) and P(t) at 9 years (t = 9), i.e., F AoDS,9 and F PSQ, 9 . This result is supported by the latest industrial report [9], one finding of which is that 82% of end-users expressed a strong appreciation of the quality of services provided by the facility management operator in a PPP facility, and 95% of them stated that their PPP project delivered on the service promised by the relevant state government and delivery agency.
For case study schools, the reliability index at t = 9 years was 3.058, thus , = 1 − ∅(3.058) = 0.1%, representing the probability of failure is 0.1% when a PPP school had been operating for 9 years. Given the is 10%, this result indicates excellent performance for the PPP school project. In other words, these is enough resilience between end-user's satisfaction and actual servicer-delivered performance, which can protect the government from major unexpected needs or changes during the operations. Figure 5 shows the distribution of ( ) and ( ) at 9 years (t = 9), i.e., , and , . This result is supported by the latest industrial report [9], one finding of which is that 82% of end-users expressed a strong appreciation of the quality of services provided by the facility management operator in a PPP facility, and 95% of them stated that their PPP project delivered on the service promised by the relevant state government and delivery agency.

Satisfaction Analysis of End-User's Perceived Service Quality
The questionnaire conducted in t = 9 is composed of three parts: the end-user's satisfaction for teaching space and non-teaching space, operation activity and maintenance activity; in total 23 items. A qualitative analysis is conducted to identify the most satisfaction and dissatisfaction factors for the three schools from the perspective of the end-user on a basis of interview scores and comments. The interview result is shown in Figure 6.

Satisfaction Analysis of End-User's Perceived Service Quality
The questionnaire conducted in t = 9 is composed of three parts: the end-user's satisfaction for teaching space and non-teaching space, operation activity and maintenance activity; in total 23 items. A qualitative analysis is conducted to identify the most satisfaction and dissatisfaction factors for the three schools from the perspective of the end-user on a basis of interview scores and comments. The interview result is shown in Figure 6.  The real body (filled stick) is the lowest and highest score observed from the interviews, and the upper shadow and lower shadow (solid line) is the scope of satisfaction score (covered 95.44% possibility zone) that such items can reach based on statistical theory. It is concluded that storage and multi-purpose hall are the most dissatisfied factors from the end-user's perspective. Insufficient space for archiving and using for another purpose are the potential reason behind the dissatisfied factors from the comments. Also, good design for IT/communications space and a larger car park would be welcome. In Hertzberg's motivation theory [50], there are hygiene factors that will not encourage employees to work harder but they will cause them to become unmotivated if they are not present. In this case, it seems that storage space, multi-purpose hall, IT/communication and larger car parks are the hygiene factors, which no matter how well they are done will not enhance the end-user's The real body (filled stick) is the lowest and highest score observed from the interviews, and the upper shadow and lower shadow (solid line) is the scope of satisfaction score (covered 95.44% possibility zone) that such items can reach based on statistical theory. It is concluded that storage and multi-purpose hall are the most dissatisfied factors from the end-user's perspective. Insufficient space for archiving and using for another purpose are the potential reason behind the dissatisfied factors from the comments. Also, good design for IT/communications space and a larger car park would be welcome. In Hertzberg's motivation theory [50], there are hygiene factors that will not encourage employees to work harder but they will cause them to become unmotivated if they are not present. In this case, it seems that storage space, multi-purpose hall, IT/communication and larger car parks are the hygiene factors, which no matter how well they are done will not enhance the end-user's satisfaction, while the absence of these will lead to dissatisfaction.
On the other hand, the end-user is happy with canteen space, teaching preparation space, and the resources center. It is noted that teachers spend a lot of time in the teaching space provided and it can be seen that this might be a motivating factor to give them a sense of achievement, but increased student number is a major concern for future satisfaction. The motivating factors are those that will really influence customer satisfaction. Putting more effort into solving problems relating to motivating factors might be more useful in enhancing end-user satisfaction. Furthermore, the good relationship with facility manager on operation and maintenance is also an important reason for the principal and teachers' satisfaction. From satisfaction analysis, it is obvious that not enough space is consistently raised as an issue for PPP schools, especially if student numbers increase as envisaged. Many of these areas of dissatisfaction relate to scoping rather than performance.

Forecasting Analysis in t = 25 and Discussion
To validate a time-based performance curve, a long-term PPP school performance is predicted. An autoregressive integrated moving average model is used to forecast the actual quarterly service payments in future, based on historical data. It is a famous model in time-series analysis in statistics and econometrics and fitted to time series data either to better understand the data or to predict future points in the series, and has been applied in some cases where data show evidence of non-stationarity. This model considers the evolving variable of interest is regressed on its own lagged values and indicates the regression error is actually a linear combination of error terms whose values occurred contemporaneously and at various times in the past. It has the following general form: where X t is the actual quarterly service payments in past years in this study, L is the 'backward operator' with the quarterly service payments, i.e., L·X t = X t−1 , and L i means L with power i. ϕ i and θ i are fixed parameters needed to be estimated. ε t is the measurement error with zero mean. When the time series X t is non-stationary, a differencing step (corresponding to the 'integrated' part of the model) is used to get a stationary process X * t by the first Equation in (8).
The parameter d is determined by a unit root test with the least order coefficient of less than 0.05 in its P value, rejecting the null hypothesis that the process is non-stationary. Then the second equation in (6) is to handle the process X * t . p is determined by the time-lag with the highest order coefficient of less than 0.05 in its p-value, rejecting the null hypothesis when the null hypothesis is assumed to be true. This allows a 5% probability of a Type I error to reject the null hypothesis when it is true. The weighting and strength between lagged and current periods are inferred from ϕ 1 , ϕ 2 , . . . , ϕ i , similar as θ 1 , θ 2 , . . . , θ i . The procedures for deducing the parameter d,p,q of the autoregressive integrated moving average model in this study is briefly described using the following steps: determining the value of d by unit root test, determining the value of p,q by time-lag test, determining the value of ϕ 1 , ϕ 2 , . . . , ϕ i , and θ 1 , θ 2 , . . . , θ i by regression and the maximum loglikelihood estimate method, and identification of inconsistencies between regressed values and actual data.
The mean value of A(t) is derived from history data of the first 9 years and forecasted using this model, please see Figure 7a. ,..., , similar as , ,…, . The procedures for deducing the parameter , , of the autoregressive integrated moving average model in this study is briefly described using the following steps: determining the value of by unit root test, determining the value of , by time-lag test, determining the value of , ,..., , and , ,…, by regression and the maximum loglikelihood estimate method, and identification of inconsistencies between regressed values and actual data.
The mean value of ( ) is derived from history data of the first 9 years and forecasted using this model, please see Figure 7a. To predict the end-user's perceived service quality P(t) in future, a power-law curve is fitted using current data (t = 9) under the hypothesis that: when t = 0, the end user's satisfaction is 7, thus in the ideal situation, the end-user is satisfied with the new PPP school project at the earliest beginning of operations, and when the T goes to infinite, the satisfaction score is 3, thus the bottom line for the end-user is 3, meaning finally no availability for all the services; please see Figure 7b.
Based on this synthetic projection, the end-user's perceived service quality in the year 25 is forecasted as µ P,25 = 5.203, σ P,25 = 0.356. Meanwhile, µ A,25 = 4.871 according to the ARIMA method, and σ A,25 is assumed as the same with that in t = 9, i.e., σ A,25 = 0.014. The distribution of A(t = 25) and P(t = 25), i.e., shown as F AoDS,25 and F PSQ,25 , were expressed as Figure 8. The reliability index θ at t = 25 years is 0.9305, thus the P f ,t = 25 = 1 − ∅(0.9305) = 17%. It infers the overlap area would expand with the operation year increasing. In the case of l 2 is 20%, the service performance for PPP schools in Australia is counted as acceptable. In fact, the overload area will be affected by the position, mean value and dispersion of A(t) and P(t) curves. Therefore, in the case of almost unchanged of A(t), increasing P(t), shrine fluctuations of A(t) and P(t) due to uncertainty, can effectively reduce the probability of failure P f ,t and improve social PPPs service performance.
This result indicates the decision-makers in both the private sector and government should give a significant voice to the end-user and their communities to meet not only current but also future social infrastructure needs, and better consider evolving the PPP contractual terms to further focus on end-user-oriental service output specifications in order to provide long-term flexibility to serve the end-user and community's needs.
A key variant that influences the outcome detailed in Figure 8 relates to the forecasting result of the end-user's satisfaction, which is mainly affected by the student numbers. The relationship between student projections and end-users' perceived service quality appear closely linked. Thereby, it is a necessary step to forecast the end-user's need through forecasting student numbers during the preparation of PPP school contracts. Otherwise, this is the time for government to start major changes or modifications for the project when the probability of failure is above 20%, most of which would cost a lot more than if these had been considered in the contract. , overlap area would expand with the operation year increasing. In the case of is 20%, the service performance for PPP schools in Australia is counted as acceptable. In fact, the overload area will be affected by the position, mean value and dispersion of ( ) and ( ) curves. Therefore, in the case of almost unchanged of ( ) , increasing ( ) , shrine fluctuations of ( ) and ( ) due to uncertainty, can effectively reduce the probability of failure , and improve social PPPs service performance.
. This result indicates the decision-makers in both the private sector and government should give a significant voice to the end-user and their communities to meet not only current but also future social infrastructure needs, and better consider evolving the PPP contractual terms to further focus on end-user-oriental service output specifications in order to provide long-term flexibility to serve the end-user and community's needs.
A key variant that influences the outcome detailed in Figure 8 relates to the forecasting result of the end-user's satisfaction, which is mainly affected by the student numbers. The relationship between student projections and end-users' perceived service quality appear closely linked. Thereby, it is a necessary step to forecast the end-user's need through forecasting student numbers during the preparation of PPP school contracts. Otherwise, this is the time for government to start major changes or modifications for the project when the probability of failure is above 20%, most of which would cost a lot more than if these had been considered in the contract.

Conclusions
Social infrastructure PPPs continue to play an important role in the early provision of facilities and enhanced services yet there remains a gap in the direct feedback that end-users have on the contract. There remains a gap in how feedback and expectations of end-users are considered by PPP contract managers and the lack of tools and processes to support decision-makers when evaluating requests to vary or modify PPP agreements. Current performance evaluations of the social PPPs (e.g., schools and hospitals) are dominated by the use of KPIs. Benchmark comparisons have focused on time, quality and cost performance during construction and there remain weaknesses in how to appropriately measure service performance outcomes during operations.
In the present study, a reliability-based decision support framework (RDSF) is developed by integrating specific performance of a PPP contract A(t) with the perceived performance achieved and P(t) over a project's life cycle from the end-user's perspective. It was tested on observed service outcomes from three PPP schools in Australia. The following are some major findings: • It demonstrates the developed RDSF could potentially provide guidance for developing performance and improvement strategies in the context of uncertainty. • Interestingly, for the schools analysed, the results accord with anecdotal evidence provided in the survey and interviews that nine years into the contract the PPP agreement was performing well. The result of probability of failure P f ,t of 0.1% at t = 9 supports good service performance in the case example schools. For long-term performance prediction, acceptable service performance of 17% was obtained when projecting to 25 years.

•
The P f ,t may increase with longer operation years mainly owing to concerns of limited space if enrolment numbers increase. In this case, building additional rooms to satisfy the end-user's need for space could improve service performance, thus achieving long-term value for money in PPP operations. Therefore, the government should acknowledge the need for flexibility over long-term contracts to deal with potential major changes.

•
The social PPPs performance assessment needs to consider the end-user's perception into decisions to improve performance and assess the uncertainties of project key parameters.
The test showed that the RDSF has merit and is achievable using data that are reasonably obtained from social infrastructure PPP contracts. The application of this analytic approach has, first, considered the end-user's perception of social PPPs into decisions to improve performance, and then assessed uncertainty using reliability analysis. The aforementioned findings enable the direct consideration of probability of the social PPPs not meeting the end-user's expected outcomes, termed of 'probability of failure'.
Contributions in theory and practice are offered in the context of social PPPs. In theory, the RDSF, which could potentially enable the engagement of end-user in social PPP operations, can evaluate service performance as a form of 'probability of failure'. This addresses the research gap identified in Figure 1, enriches theories in the field of performance management system (PMS), and also contributes to knowledge regarding an evidence-based test for justifying possible agreement modifications or additional works in social PPPs operations. In practice, the RDSF incorporates end-user's perceived service quality on space, operation and maintenance activities of PPP social infrastructure projects to further deal with the end-user's need to improve performance as a managerial decision-making tool. The improvements in perspective of the end-user's dissatisfied area will close the gap between the end-user's expectations and contractual obligations (i.e., Gap 1 in Figure 1). The advance in this field could allow maintenance of the end-user's required long-term operational standard of the PPP social projects and ultimately better VfM.
Some limitations are also presented in this study. First, the RDSF is tested in only three PPP schools, thus the next step in the development of this approach is to gain greater confidence in the results by widening the sample of project types and years of service. Second, the indicators for measuring the end-user's satisfaction and the assessment of delivered service are different in types of social PPPs. Therefore, this study aims to develop the conceptual RDSF model rather than identifying the indicators in the measurements. Consequently, for further studies, this RDSF has the potential to consider full-scale end-user perceived service quality into different types of social PPPs performance improvements. Sensitivity analysis also could be conducted when uncertainty factors are identified and modeled. Furthermore, this framework is valuable to understand the end-user's important engagement in public service performance, which could encourage more value for money in social PPPs infrastructure projects.