The Development of a Measurement Instrument for the Organizational Performance of Social Enterprises

: There is a growing consensus that the adoption of performance measurement tools are of particular interest for social enterprises in order to support internal decision-making and to answer the demands of accountability toward their stakeholders. As a result, different methodologies to assess the non-ﬁnancial performance of social enterprises are developed by academics and practitioners. Many of these methodologies are on the one hand discussions of general guidelines or, on the other hand, very case speciﬁc. As such, these methodologies do not offer a functional tool for a broad range of social enterprises. The goal of this article is to ﬁll this gap by developing an instrument suitable for the internal assessment and the external reporting of the non-ﬁnancial performance of a diverse group of social enterprises. To reach this goal, we used qualitative (focus groups and a Delphi panel) and quantitative research methods (exploratory and conﬁrmatory factor analysis), involving multiple actors in the ﬁeld of social entrepreneurship. Focusing on ﬁve dimensions of organizational performance (economic, environmental, community, human and governance performance), we offer a set of indicators and an assessment tool for social enterprises. results. All ﬁt indices show an acceptable ﬁt: c 2 /df = 1.73; CFI = 0.968; TLI = 0.955; RMSEA = 0.055 and SRMR = 0.036. All factor loadings are signiﬁcant, however the factor loading of ENV12 is still low (0.304). Finally, we checked the reliability of the scales used to measure the three remaining indicators by calculating Cronbach’s alpha. The results indicate a satisfactory scale reliability (>0.60): Transportation ( α = 0.696), Ecological materials ( α = 0.877), Environmental Performance Management ( α = 0.829). The results indicate that the originally selected seven indicators can be reduced to three indicators, relevant for measuring the environmental performance of social enterprises.


Introduction
Due to the growing interest in sustainability and the organizational responsibilities to society, organizations face the challenge of assessing and reporting their non-financial performance.This is especially the case for social enterprises [1][2][3].Social enterprises are social mission-driven organizations that develop an entrepreneurial activity (make products and/or deliver goods and services) in order to fulfill unsolved social needs in society [4,5].They are considered as a distinct category of organizations, positioned between profit and nonprofit organizations [6,7].Social enterprises differ from profit organizations as profit is not a goal as such, but a mean to create social value [8].Compared to nonprofit organizations, social enterprises establish entrepreneurial activities to ensure their financial sustainability and rely not (exclusively) on subsidies and donations [9].
Because of the dual mission of creating social value and being financially sustainable, financial as well as non-financial performance are core to the social enterprise functioning [10].Social enterprises are described as typical hybrid organizations [11][12][13] and face some specific internal and external tensions and challenges [10].This challenging environment has made the assessment and the reporting of the organizational performance within social enterprises of particular importance.Firstly, different authors warn of internal tensions because of the difficulty of balancing the financial and social goals in decision-making, and refer to mission drift, which is the erosion of the social goals in favor of financial goals, as a threat [9,14,15].In addition to the annual account-which is useful to evaluate the financial performance-a tool that supports social enterprises to assess and discuss internally their non-financial performance might be helpful in balancing the social and the financial goals in decision-making and in avoiding mission drift [2,8].Secondly, social enterprises face external tensions, related to the need to establish legitimacy and to obtain support from different stakeholder groups [16,17].As social enterprises lack a dominant external stakeholder, they are exposed to multiple and conflicting expectations and demands of different principal stakeholder groups [14].The legitimacy perceived by stakeholders is crucial for resource acquisition, such as financial resources and human resources [9,13].Important stakeholder groups of social enterprises are the beneficiaries of the social mission and the customers, paying for the products and services delivered by the social enterprise [14].Further, also policy makers, funders and volunteers can have a legitimate stake in the organization.These stakeholders expect assessments and reporting to be transparent and comparable [18,19].This has brought social enterprises under significant pressure to seek ways to more actively manage and report their non-financial performance to answer the demands of accountability to multiple stakeholders [17].As such, social enterprises feel a need to develop, implement and use a systematic approach to assess and report their non-financial organizational performance [8].
Although there is a consensus that the development and adoption of performance measurement instruments is of particular interest for social enterprises, there is a lack of methodologies with a practical usefulness for a broad range of social enterprises [3,17].This paper aims at describing the development of a set of indicators and an assessment tool, useful for evaluating and reporting the non-financial organizational performance of social enterprises.While the financial performance of social enterprises can be evaluated based on the information available in the annual account of the organization, the aim of this paper is to develop a tool to assess the non-financial performance of social enterprises.If we refer, in the following sections, to the organizational performance of social enterprises, we actually focus on the non-financial organizational performance.
The structure of the paper is as follows.In the following section, we briefly review relevant literature on performance measurement in social enterprises.Next, we describe the different steps carried out to identify relevant indicators and to develop an assessment tool to assess organizational performance in social enterprises.The article concludes with a discussion of the development of this assessment tool, challenges involved in its use and suggestions for future research.

Performance Measurement in Social Enterprises
The idea that organizations should measure and manage their performance is a key-issue in management literature and is strongly encouraged by international bodies such as the OECD and the World Bank [20].There is a growing consensus that social enterprises should assess their performance to support internal decision-making and to respond to the increasing demands of accountability towards different stakeholders [3,21].As a result, researchers and practitioners have developed different methodologies and tools to measure the performance of social enterprises [22].These methodologies and tools are however diverse and make a comparison of the organizational performance of social enterprises very difficult [8].Grieco, Michelini and Iasevoli [2] (p.1) state that "the overall picture remains fragmentary if not confusing".The reason why methodologies and tools are falling short of expectations is twofold.On the one hand, some studies are "general" in their design and do not offer specific indicators or measurement tools.The developed methodologies and tools often discuss frameworks providing general guidelines for social enterprises considering designing and implementing a performance measurement system, e.g.Manetti [1].These papers discuss for instance how diverging stakeholder expectations can be taken into consideration or they present different steps that social enterprises can follow to implement a performance measurement system [8].They offer as such no insight in the dimensions or indicators that can or should be evaluated [2].Other papers discuss relevant dimensions of organizational performance (e.g.environmental performance, social performance, etc.), but do not propose relevant performance indicators [23].On the other hand, other studies are too specific and are examining performance measurement in specific cases, and make it difficult to replicate and generalize to other social enterprises.Bellucci, et al. [24], for instance, study the performance of fair trade shops in Italy.The performance indicators studied are specifically related to the fair trade value chain and cannot be replicated without adaptations to other organizational contexts.
Differences in approach and methods related to performance measurement in social enterprises can be attributed to two antecedents.Firstly, social enterprises differ in size, activities, objectives and accordingly relevant stakeholders.By consequence, it is not easy to develop a model that is suitable to all kind of social enterprises [8].Secondly, performance measurement can serve different purposes.Generally spoken, performance measurement can have an internal or an external purpose.A performance measurement tool can be used as an internal management instrument, enabling organizations to assess their performance and support internal decision-making.On the other hand, performance management tools with an external purpose are used for external reporting and have the main purpose of accountability to stakeholders.A different purpose implies a different design of the performance measurement system [2].Notwithstanding this diversity in organizations and performance measurement systems, we also notice a consensus on some aspects.First, there is a consensus that organizational performance is multi-dimensional.Not only is there, as mentioned earlier, the difference between financial and non-financial performance.Also, non-financial performance is multi-dimensional taking into consideration performance having an impact on the local community, the environment, society in general, and people working in the organization [2,8].Secondly, there is a consensus that performance is not only related to immediate results.Many frameworks use a "results chain" or "logic model" [22] also taking in consideration inputs (i.e.resources used) and activities within the organization [8,25].These models, stressing the alignment of an organization's input, throughput and output components, can be used to assure program alignment and to evaluate the results of an organization [26].Concerning the achieved results, a difference is made between immediate results (outputs) and medium and long-term results, often referred to as outcomes or impacts [2].Although there is growing interest in "impact measurement", the terms "outcomes" and "impact" are not consistently used [22].Ebrahim and Rangan [22] distinguish between outputs (immediate results), outcome (medium-and long-term impact on individuals) and impact (medium-and long-term impact on communities or populations).There is a consensus that organizations should at least measure and report on inputs, activities and outputs [20].However, Ebrahim and Rangan [22] doubt whether social enterprises should go further and also measure outcomes and impact.Their main argument is that the causal link between outputs and outcomes is not clear and that outcomes and impact often go beyond the control of the social enterprise.Some scholars argue that organizations, or the management of these organizations, could be demotivated, withdraw discretionary effort and sit back and see if they win or lose a performance indicator game that resembles a lottery [20].For instance, a work integration social enterprise offering a job to disadvantaged people can measure the number of people hired by the organization (output).However, whether this will result in an improved quality of life at the individual level (outcome) or a decrease in poverty at society level (impact) is not straightforward as also other external factors beyond the control of the social enterprise will have an influence on this outcome and impact.Furthermore, Ebrahim and Rangan [22] argue, mainly based on practitioner oriented literature, that focusing on the measurement of impacts and outcomes might be counterproductive, because it puts much demand on (often small) organizations without necessarily resulting in better outcomes.Instead, they suggest that outcomes and impacts be measured at an aggregated level, for instance by governments, foundations or impact investors.
Relying on these former insights, the aim of this paper is to develop a performance measurement tool for social enterprises.More specifically, we want to develop a tool suitable for a broad range of social enterprises.Taking in consideration the internal (assessing organizational performance and supporting decision-making) and external (reporting) purpose of performance measurement, we aim at providing social enterprises with a performance measurement instrument which is based on the reliable, valid, and standardized assessment of organizational performance.In developing this tool we build on the insights that performance is multi-dimensional and that, when evaluating performance, inputs, activities and outputs should be considered.Based on the arguments of Ebrahim and Rangan [22], we will not focus on outcomes and impacts.Moreover, we are convinced that taking into consideration outcomes and impacts will refrain us from developing a tool suitable for social enterprises with diverse activities.
In what follows, we discuss in detail how we developed the performance measurement tool, using qualitative and quantitative research methods and building on the expertise and points of view of a broad range of practitioners in the field of social entrepreneurship.

Methodology
The aim of the paper is twofold.On the one hand, we want to identify relevant indicators for assessing the non-financial performance of social enterprises.This set of indicators can serve the external purpose of performance measurement: in the external reporting to stakeholders, social enterprises can elaborate on their non-financial performance related to these indicators.This is in line with existing standards developed to assess non-financial performance (e.g.Global Reporting Initiative, GRI) [27].These standards offer a broad range of possible performance indicators, and organizations choose, given their activity, performance indicators they consider as relevant.While different efforts have been made to develop specific sets of relevant performance indicators for different kind of organizations, such as NGOs and public sector organizations, this is not the case for social enterprises [28,29].
On the other hand, based on the selected indicators, we want to develop a measurement instrument that social enterprises can use as a self-assessment tool to evaluate internally their nonfinancial organizational performance.In line with existing quality assessment frameworks, such as in the EFQM Excellence model of the European Foundation on Quality Management, different members of the organization can complete the measurement instrument, enabling the assessment of the non-financial performance and supporting decision-making [30].
In order to be able to realize these two aims, we followed generally accepted guidelines and phases outlined in scale development literature [31,32].Table 1 gives an overview of the five phases of the research process, combining deductive and inductive methodologies to generate relevant indicators and items [33].We performed our research in a Belgian region, namely Flanders.As it is the case in Europe in general, Flemish social enterprises mainly emerged because of the persistence of structural unemployment and the need for more active policies to tackle the increasing exclusion of specific groups.These "work integration social enterprises" offer a job to disadvantaged people, but in addition they focus actively on job training, necessary to make reintegration in the labor market possible, and provide social support to solve personal problems which are often obstacles for employment [11,12,34].While some of these organizations are specifically set up to hire disadvantaged people, others are organizations or local authorities who hire some disadvantaged workers, next to a majority of regular employees [34].Next to the work integration social economy, there is a growing interest in the entrepreneurial, innovative approach of social enterprises.In Flanders, these social enterprises often are member-based democratic organizations, mainly adopting the organizational form of "cooperatives" [35,36].In the phases of the research process, we took into consideration and involved (representatives of) the different groups of social enterprises.This will be discussed more in detail when commenting on Phases 2, 3 and 4 of the research process.

Phase 1: Literature Review
To obtain an overview of relevant performance indicators, we started with an extensive literature review.This deductive approach is appropriate as measuring the non-financial performance of organizations has gained increasing attention in the literature [33].While screening scientific journals, we noticed that relying on internationally accepted standards is a common practice when studying non-financial performance of organizations [37].Digging deeper into these standards, we selected four standards often referred to in the literature and which have proven their worth in practice as well as in scientific research: (1) Kinder, Lydenberg, Domini (KLD) social responsibility rating [37,38], (2) Dow Jones Sustainability Index (DJSI) [39,40], (3) Global Reporting Initiative (GRI) [41,42] and (4) ISO 26000 [43,44].Table A1 provides an overview of the performance domains considered by each of the four selected standards.Based on this overview, we selected five performance domains that (1) are taken into account by several of these standards and (2) are relevant in the context of social enterprises: economic, human, environmental, community and governance performance.Figure 1 gives an overview of the five selected performance domains.Economic performance is related to the economic conditions supporting a strong financial position, important for the viability of organizations.As such, the focus is not on financial indicators reported in the annual financial accounts of organizations, but on economic indicators influencing these financial indicators [27,45].Human performance refers to the relationship of the organization with its workforce [46].Environmental performance focuses on the efforts organizations make to protect nature [47].Community performance refers to how organizations deal with their responsibilities in society [48], including the relationships with dominant stakeholders: beneficiaries of the social mission and customers, paying for the delivered products and services.Governance performance refers to "systems and processes concerned with ensuring the overall direction, control and accountability of an organization" [49].Important issues related to organizational governance are board composition and board behavior [50,51], as well as dealing with stakeholder expectations [52].Governance performance is a particular performance domain as it is expected that good governance practices have a positive impact on organizational decision-making, in turn positively influencing the other performance domains of the organization [53,54].
In a next step, we detected relevant indicators for measuring the five performance domains.As the performance of social enterprises has only rather recently been examined, the literature and research on the performance of social enterprises is still limited.We therefore decided to screen the literature on research regarding the non-financial performance of organizations in general, which is more extensively studied in the context of Corporate Social Responsibility (CSR), where it is referred to as "social performance" [37].We started with the examination of 10 high impact management journals (included in the ISI Web of Science), looking for articles with "social" and "performance" in the title in the period 1990-2013.Table A2 gives an overview of the screened management journals.As a result, we found 68 articles.Thirty-three articles were not relevant because they did not focus on the non-financial performance of organizations.Analyzing the remaining 35 articles, we concluded that these articles only provide a limited number of possible indicators.The main reason is that many articles do not refer to relevant indicators because they are conceptual or because they use existing ratings provided by, for instance, financial institutions to assess the non-financial performance of organizations.
In a next step, we screened two additional journals: Journal of Business Ethics and Social Enterprise Journal.We selected these journals because they have a focus on CSR and social enterprises.Once again, we screened the journals for relevant articles with "social" and "performance" in the title in the period 1990-2013 (As "Social Enterprise Journal" only exists from 2005, we screened this journal for the period 2005-2013).As a result of this additional screening we found 60 additional articles, which provided us with more relevant indicators.
Based on the literature review, we retained 41 indicators.

Phase 2: Focus Groups
We noticed that, based on the examined management literature, it is difficult to conclude that the retained indicators are most relevant for the context of social enterprises.Firstly, because the screened literature is not exclusively related to the performance of social enterprises.Secondly, because it is possible that some relevant indicators are not detected in the literature, we therefore decided to combine the deductive approach of the literature review with additional inductive approaches [33].To check the relevance and completeness of the selected indicators, we organized two focus group sessions.The use of focus groups is a common qualitative research method in social sciences, often as a part of the development of measurement instruments [55].We decided to organize focus groups because it enables in-depth discussions with experts on emerging and unexplored topics [56].
The reason why we selected key informants with a different background to participate in the focus groups, is twofold.Firstly, we wanted insights on the performance of different kinds of social enterprises (the dominant form of work integration, as well as other social enterprises such as cooperatives).Secondly, we considered it useful to ask the opinion of employees involved in the management of a social enterprise and informants with a broader focus.The latter group are mainly researchers and civil servants supporting social enterprises.We therefore decided to organize two focus groups.In the first focus group session, eight managers of social enterprises were involved.Because of the prevalence of work integration social enterprises in Flanders [34], we invited managers of different types of work integration social enterprises.In the second focus group, we aim at a broader perspective: three representatives of sectorial federations of work integration social enterprises (The federation of social workshops SST and the federation of sheltered workshops VLAB), two researchers with a broad perspective on social entrepreneurship and two civil servants of the Flemish government were involved.
Organizing two focus groups involving 15 key informants with a different background gave us the opportunity to gain insight into different perspectives regarding measuring the performance of social enterprises.Specifically, we asked the participants whether the five performance domains and the 41 indicators selected in Phase 1 are suitable for assessing the performance of social enterprises and whether there were indicators missing.As a result of the focus groups, the 41 indicators selected in Phase 1 were approved and 12 indicators were added, resulting in 53 indicators.The indicators in each performance domain are presented in Table 2.A distinction is made between indicators selected based on the literature review (Phase 1) and approved by the focus groups (Phase 2), on the one hand, and indicators that are provided by the focus groups (Phase 2), on the other hand.

Phase 3: Delphi Panel
In focus groups, group dynamics and more particularly the dominance of some participants may substantially influence the results.Moreover, focus groups are not anonymous, potentially making people less outspoken [55,56].To overcome these potential disadvantages of focus groups, we used the Delphi technique to reach a consensus on the indicators.The Delphi technique encompasses a structured, iterative process in which subject matter experts share their anonymous opinion during subsequent rounds [57][58][59][60][61][62][63].Specifically, this Delphi panel includes 17 panelists with different backgrounds: (1) managers of social enterprises, (2) experts on social entrepreneurship (academics, government officials, representatives of sectorial federations) and ( 3) members of two networks of organizations focusing on sustainability and corporate social responsibility (Kauri and Positive Entrepreneurs) and as such having a keen interest in non-financial performance.By synthesizing these opinions after each round, the researcher pursues consensus within the panel of experts [60,62,63].After two rounds, the required consensus was achieved which resulted in the removal of 13 indicators and the selection of 40 indicators.Table 2 gives an overview of the removed and accepted indicators.

Phase 4: Survey Instrument Development and Administration
As explained earlier, next to the selection of relevant performance indicators which can serve the external purpose of reporting to external stakeholders, we aim at developing a measurement instrument that social enterprises can use as an internal, self-assessment tool.Therefore, the 40 selected indicators were concretized in a survey instrument.Questionnaires are the most commonly used method of data collection in field research [33] and, over the past several decades, scales have been developed suitable for assessing input, throughput and output of the performance of organizations.In the next section (Phase 5), the items and scales used to measure the indicators are discussed for each performance domain separately.To achieve high levels of content validity, most of the constructs and measures used in the instrument were already verified in earlier research.Next, we created measures by adapting existing scales.
Given the purpose of developing a measurement instrument suitable for a broad range of social enterprises, the survey was contributed to different groups of organizations, encompassing the sector of social enterprises in Flanders, Belgium.The work was carried out with the active help of the Flemish government who provided the sample for the study.
The following organizations were selected: (1) sheltered workshops and social workshops: established with the main purpose of reintegrating job seekers who face difficulties to find a job in the regular job market because of physical, social or psychological problems, mainly operating in packaging, assembly, gardening, recycling, and printing [12], (2) local service economy initiatives: social enterprises closely connected to local authorities, offering jobs to long-term unemployed people in combination with offering quality services to the local community and households (e.g.cleaning services, shopping assistance for the elderly) [34], (3) work experience enterprises and work care initiatives, offer a job to long-term unemployed people and are mainly active in health and social care or the cultural sector [34], (4) work integration enterprises receive subsidies in return for employing long-term unemployed jobseekers and integrate them into their regular staff [34] and ( 5) cooperatives: member-based democratic organizations [64].
The survey was distributed to the top managers of 1018 social enterprises.These top managers have an overview of the overall performance of the organization, including the different performance domains which are part of our measurement instrument.The survey was distributed using a webbased tool (Qualtrics).After a period of intensive follow-up (mail and telephone) of the responses, a total of 244 line managers completed the survey, yielding a response rate of 24%.After removing incomplete surveys, our results are based on the responses from top managers of 241 organizations.The age of the organization in our sample varies between 2 and 93 years old, with an average age of 26.The number of employees ranges from 1 to 2023 with an average of 147 employees.A total of 84% of the organizations are SMEs with less than 250 employees.Table 3 gives an overview of the population of 1018 social enterprises and of the 241 social enterprises that participated.

Phase 5: Validation of Relevant Indicators and the Assessment Tool
The aim of this phase in the research process is twofold: (1) reduction of the number of indicators by identifying overarching performance indicators, encompassing different of the retained indicators and (2) validation of the developed survey instrument.Building on scale development and construct validation literature [31,32], we use exploratory and confirmatory factor analysis to reach these goals and we assess the internal consistency of the remaining scales using Cronbach's alpha [65,66].
We will discuss the analyses and results for each performance domain separately (economic, environmental, community, human and governance).First we will give an overview of the items and scales used to measure the selected indicators.When analyzing the data, we first used exploratory factor analysis (EFA).Because some indicators are measured using adapted scales, have not been validated in prior work or are measured using a single item, running a factor analysis for the items of each indicator separately would be inappropriate.Therefore we conducted, for each performance domain, EFA of all items.If, within a performance domain, items used to measure different performance indicators load on a latent factor, we can reduce the number of indicators by detecting an overarching indicator.Items that load insufficiently onto one factor will be removed if different items are used to measure the indicator [65].
We build on the results of the EFA to specify the factor models used in the confirmatory factor analysis (CFA).We conducted confirmatory factor analysis using the Lavaan package, developed for Structural Equation Modeling (SEM) in the statistical program R [67].Because we use categorical data (ordinal variables using Likert scales and dichotomous variables), we use the robust weighted least squares (WLSMV) estimator [68].The Chi Square statistic is commonly reported in CFA research, more specifically, we report the Chi-Square test statistic, divided by the degrees of freedom (c 2 /df) [65].Next to the Chi-Square statistic, it is suggested to take into consideration different other fit indices to evaluate the model fit [66,68].Brown [68] (p.82) distinguishes three categories of fit indices and advises to report at least one index from each category.Following the advice of Brown (2006), we report the Standardized Root Mean Square Residual (SRMR), the Root Mean Square Error of Approximation (RMSEA), the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI).
There is no consensus on the cutoff values that should be used to evaluate model fit.It is even argued that the use of absolute cutoff values is inadvisable because fit indices are influenced by different aspects of the research setting, e.g.sample size and type of data [66,68].However, there are some guidelines for the fit indices we use in our study.For the c 2 /df ratio, Janssens, Wijnen, De Pelsmacker and Van Kenhove [65] mention as criterion < 2, while Hair, Black, Babin and Anderson [66] mention < 3.For the SRMR, Hu and Bentler [69] use a cutoff value of 0.08, while Hair, Black, Babin and Anderson [66] mention that an SRMR over 0.1 suggests a problem with fit.Concerning the RMSEA, the cutoff value of 0.06 proposed by Hu and Bentler [69] is often referred to.However, Brown [68] mentions that RMSEAs in the range of 0.8-0.1 suggest a mediocre fit and that models with RMSEA over 0.1 should be rejected.For CFI and TLI, Hu and Bentler [69] suggest values ≥ 0.95, but different authors indicate that values in the range 0.9-0.95indicate acceptable fit [65,66,68].
Finally, we assess the internal consistency or reliability of the scales used for measuring the different indicators, by reporting the Cronbach's alpha [65].Based on Hair, Black, Babin and Anderson [66], a value above 0.7 is considered to indicate a strong reliability, while a value above 0.6 indicating a satisfactory reliability, allowing the use of summated scales.

Economic Performance
Economic performance is related to conditions supporting the financial sustainability of organizations.As mentioned earlier, the focus is not on traditional financial indicators such as profit, cash flow, Return on Assets (ROA) and Return on Investment (ROI) [45], but on indicators positively influencing these indicators.Based on the focus group sessions and the Delphi panel, three indicators are selected to assess the economic performance of social enterprises.Table 4 gives an overview of the indicators, items and scales used to evaluate economic performance.
These indicators (innovation, proactiveness and risk taking) are related to the entrepreneurial orientation of organizations.Therefore, we used the measure introduced by Helm and Andersson [70], specifically developed to evaluate the entrepreneurial orientation of social enterprises and comprising three subscales to measure innovation, proactiveness and risk taking.The scale measures along a continuum: two opposite statements are formulated and respondents are asked to indicate on an 8-point Likert scale which statement best characterizes their organization.We conducted a principal component exploratory factor analysis using varimax rotation of the 10 items of the scale.The results are reported in Table 5.The results show three factors with eigenvalues greater than one, explaining 72% of the variance, corresponding to the subscales identified by Helm and Andersson (2010).Item "ECON1" has a high factor loading (>0.5) on "Innovation" as well as on "Proactiveness".For that reason we decided to exclude "ECON1" in the confirmatory factor analysis.The other items loaded sufficiently onto one single (expected) factor.In a next step, we used CFA to test a second order model.Specifically, we checked whether the results of EFA are confirmed by CFA and whether the three detected factors (innovation, proactiveness and risk taking) load onto the second order factor "Economic Performance".The results are reported in Figure 2.
The results indicate that the indicators (1) innovation, (2) proactiveness and (3) risk taking are relevant to measure the economic performance of social enterprises.Finally, we checked the internal consistency of the scales used to measure these indicators by calculating Cronbach's alpha.The results reported in Figure 2, indicate strong scale reliability (>0.70): innovation (α = 0.829), proactiveness (α = 0.848), risk taking (α = 0.739).

Environmental Performance
Environmental performance refers to the efforts organizations make to protect nature [47].Based on the focus group sessions and the Delphi panel, seven indicators are selected to assess the environmental performance of social enterprises.Table 6 gives an overview of the indicators, items and scales used to evaluate the environmental performance.We conducted a principal component exploratory factor analysis using varimax rotation of the 10 items measured using a 7-point Likert scale.The results are reported in Table 7.Based on the results of the EFA, we can distinguish three factors, explaining 74% of the variance.These three factors are related to (1) Transportation, (2) Use of ecological materials and (3) Environmental performance management.Item "ENV6" has a high factor loading (>0.5) on "Ecological Materials" as well as on "Environmental performance management".For that reason, we decided to exclude "ENV6" in the confirmatory factor analysis.Item "ENV8" seems to be scattered across the three factors and does not load sufficiently onto one single factor.For that reason also "ENV8" is eliminated in the confirmatory factor analysis.The other items load sufficiently onto one single factor.We checked whether the results of EFA are confirmed by CFA and whether the three detected factors (transportation, ecological materials and environmental performance management) load onto the second order factor "Environmental Performance".We also added the two items measured as dummy variables.Based on content, we added ENV11 to the factor "environmental performance management" and ENV12 to the factor "ecological materials".
The fit indices reveal an acceptable fit, except for TLI: c 2 /df = 2.79; CFI = 0.922; TLI = 0.890; RMSEA = 0.086 and SRMR = 0.047.All factor loadings are significant (p < 0.001), but the factor loading of ENV12 is low (0.283).Further investigation of the model in Lavaan shows that the fit of the model will be better if ENV12 is moved to the factor "environmental performance management".This is acceptable as the use of renewable energy can be considered as an environmental result.This is comparable to the fact that ENV10 (waste reduction), based on the results of EFA, also loads on this factor.
The fit of this model is better.Figure 3 gives an overview of the results.All fit indices show an acceptable fit: c 2 /df = 1.73;CFI = 0.968; TLI = 0.955; RMSEA = 0.055 and SRMR = 0.036.All factor loadings are significant, however the factor loading of ENV12 is still low (0.304).Finally, we checked the reliability of the scales used to measure the three remaining indicators by calculating Cronbach's alpha.The results indicate a satisfactory scale reliability (>0.60):Transportation (α = 0.696), Ecological materials (α = 0.877), Environmental Performance Management (α = 0.829).The results indicate that the originally selected seven indicators can be reduced to three indicators, relevant for measuring the environmental performance of social enterprises.

Community Performance
Community performance refers to how organizations deal with their responsibilities in society [48].Following the results of the focus groups and the Delphi panel, seven indicators are selected.Table 8 gives an overview of the indicators, items and scales used to evaluate the community performance.We conducted a principal component exploratory factor analysis using varimax rotation of eight items.As we did in the analyses of environmental performance, we did not take in consideration COM9 because it is a dummy variable.The results are reported in Table 9.Based on the results of the EFA, we can distinguish two factors, explaining 56% of the variance.The first factor is related to the indicator "Hiring disadvantaged people".All the other indicators load on a second factor, that we call "Community responsibilities".
The results of the EFA reveal that all indicators of community performance load onto one factor, except for the items related to the indicator "hiring disadvantaged people".This indicates that hiring disadvantaged people is considered as a distinctive performance indicator of community performance in relationship to the other indicators.We use CFA to check if this distinction is confirmed.Furthermore, using CFA, we want to investigate whether the two detected factors load on the second order construct "community performance".In this stage we also add COM9, measured as a dummy variable.In our model we add COM9 to the factor "Community responsibilities" as choosing local suppliers is not related to the hiring of disadvantaged people.Testing this second order model in Lavaan, the fit indices show an acceptable fit (c 2 /df = 1.69;CFI = 0.932; TLI = 0.905; RMSEA = 0.054 and SRMR = 0.055), however the factor loading of COM9 (local suppliers) is low (0.173) and not significant (p < 0.1).A possible explanation is that the organizations in our sample are mainly small, locally embedded organizations.A total of 89% of the social enterprises in our sample mainly have local or regional suppliers.As such, having local suppliers is not a distinguishing factor in our sample.Because of the low, insignificant factor loading, we decided to exclude COM9.
Figure 4 gives an overview of the CFA results of the adapted model.The fit indices show a good fit: c 2 /df = 1.78;CFI = 0.943; TLI = 0.916; RMSEA = 0.057 and SRMR = 0.053.All factor loadings are significant (p < 0.001).The Cronbach's alpha of the scales used to measure "hiring disadvantaged people" (α = 0.767) and "community responsibilities" (α = 0.718) reveal a strong reliability.The results indicate that the seven indicators selected for assessing the community performance of social enterprises can be reduced to two indicators.

Human Performance
Human performance refers to the relationship of the organization with its workforce [46].Based on the results of the focus groups and the Delphi panel, 12 indicators are selected.Table 10 gives an overview of the indicators, items and scales used to evaluate the human performance.

Calantone et al. (2002) [77] HUM1
Our organization has a strong ability to learn and this offers us a competitive advantage

HUM2
The basic values of this organization include learning as key to improvement

HUM3
The sense around here is that employee learning is an investment, not an expense The results of the EFA are shown in Table 11.Based on the results of EFA, we can distinguish four factors, which we identify as (1) performance support, (2) training & development, (3) HR-policy and (4) diversity management.The results are not straightforward: different items have high factor loadings on different factors.A second problem is that in some cases the loading on a specific factor cannot be explained based on the content of the item.This is not exceptional in EFA.Janssens, Wijnen, De Pelsmacker and Van Kenhove [65] suggest that content should be taken into consideration and that content takes precedence over factor loadings.In analyzing and evaluating the factor loadings we took different steps.In a first step, we look for items with significant factor loadings on different factors.Based on our sample size, we consider values above 0.375 as significant factor loadings [65].If there are different items used to measure the indicator, we remove the item.This is the case for HUM15.If the indicator is measured using a single item, we decide not to remove the item because this would imply removing the indicator.Instead, we evaluate based on content to which factor we will add the item in the CFA-model.This is the case for HUM5, HUM6, and HUM14.For Item "HUM5" (significant loading on H2 and H3), we decide to assign it to H2 as HUM5 is more related to education and training.Item "HUM6" has a significant factor loading on H1 and H2.Based on the content, we decide to add it to H2. Item "HUM14" has a significant loading on H1 and H3.As it is more related to HR-policy (H3) than to Performance Support (H1), we assign it to H3.For some items with significant factor loadings on different factors, we notice that it is difficult to assign them to one of the factors.This is the case for HUM9 and HUM20.Therefore, we decide to assess these items separately and not assigning them to a first order factor.Instead, we will add these items to the model, directly loading on the second order factor "Human Performance".
In a second step, we check if the items with only one significant factor loading can be assigned to that factor based on content.HUM12 has a significant factor loading on H3 but is not related to training and development.Neither is it related to one of the other factors.Therefore, we decide, similarly to HUM9 and HUM20, to load it directly onto the second order factor Human Performance.
We use CFA to test this second order model.The results of the second order CFA model show acceptable fit indices: c 2 /df = 2.94; CFI = 0.916; TLI = 0.903; RMSEA = 0.09 and SRMR = 0.066.All factor loadings are significant.The fit of the model will however be better when HUM14 is removed to H2.Based on content, HUM14 is as well related to H2 (Training and Development) as to H3 (HR-policy).However because the overall fit is better when HUM14 is assigned to H2, we decide to assign HUM14 to H2.  Governance performance focuses on good governance practices.On the one hand, it is related to best practices regarding board composition and board practices.On the other hand, it refers to having clear organizational goals taking into consideration stakeholder expectations [49,50].
Based on the results of the focus groups and the Delphi panel, 11 indicators are selected.Table 12 gives an overview of the indicators, items and scales used to assess governance performance.

GOV1
New board member are selected to meet the organizationʹs changing needs Adaptation based on Herman and Renz (2004) [82] Adaptation to changes in the environment Adaptation based on Jackson and Holland (1998) [83]

GOV2
The board of directors is able to cope with changes in the legal environment.

GOV3
The board of directors is able to cope with changes in the economic environment.

GOV4
The board of directors is able to cope with changes in the political environment.

GOV5
The board of directors is able to cope with changes in the needs of stakeholders.

Engagement of board members toward the mission and vision of the organization
Fredette and Bradshaw (2012) [84] GOV6 Board members share the same ambitions and vision for the organization.

GOV7
Board members enthusiastically pursue collective goals and mission.

GOV8
Board members are committed to the goals of the organization.

GOV9
Board members view themselves as partners in charting the organization direction.
GOV10 There is a commonality of purpose among board members of this organization.
GOV11 Everyone in the board of directors is in total agreement on our organizationʹs vision.
Participative decision-making Li and Hambrick (2005) [85] GOV12 All the board members have a voice in major decisions.
GOV13 Communications among board members can best be described as open and fluid.

GOV14
When major decisions are made, board members collectively exchange their points of view.GOV15 Board members frequently share their experience and expertise.GOV18 This board communicates its decisions to everyone who is affected by them GOV19 The board is actively involved in long-term strategic decision-making GOV20 The board is actively involved in implementing long-term strategic decision-making GOV21 The board is actively involved in promoting strategic initiatives Goals meeting the needs of the stakeholders Rettab et al. (2009) [73] GOV22 The goals of our organization meet the needs and requests of all our stakeholders Clear organizational mission and goals Wright (2007) [88] GOV23 It is easy to explain the goals of this organization to outsiders GOV24 This organizationʹs mission is clear to everyone who works here GOV25 This organization has clearly defined goals Independent board members Hillman et al. (2000) [89], Haynes and Hillman (2010) [90] GOV26* Does the organization has outside, independent directors?* Item removed after CFA; All items measured on a 7-point Likert scale, except for GOV26 (Yes/No).

Clarity of roles
We conducted a principal component exploratory factor analysis using varimax rotation of the 25 items measured on a 7-point Likert scale.The results are reported in Table 13 and reveal five factors, explaining 70% of the variance.However, the results are not straightforward: several items have significant factor loadings on different items and sometimes the loading on a factor is not in line with the content of the item, making a thorough evaluation necessary.First, we look for items with significant factor loadings on different factors.If different items are used to measure the indicator, we consider excluding the item.This is the case for GOV12, GOV13 and GOV15, items used to measure the indicator "Participative decision-making", but which is apparently closely related to the indicator "Shared vision".Eliminating these three items implies that "Participative decision-making" would be measured using only one item.We do not consider this as a suitable solution.An alternative solution is to combine "Shared vision" and "Participative decisionmaking" and exclude GOV14.However, we do not consider this a good solution either as having a shared vision on the mission and goals of the organization is clearly distinct from how issues are discussed within the board and from how decisions are made by the board.Therefore, we decide to keep both the factors G1 an G4 and to keep the items GOV12, GOV13 and GOV15.Also, GOV17, GOV18 and GOV22 have high factor loadings on different factors.However, because these items are used to measure indicators with a single item, we decide not to remove the items.Instead, we determine to which factor the items can be added based on content.GOV17 has a high factor loading on G2 and G4.Examining content we decide to add GOV17 to G2 because this factor is related to the extent that boards are able to adapt to changes in the external environment.We consider "preparedness to learn from mistakes" (GOV17) as related to the adaptability of the board of the organization.
GOV18 has a high factor loading on G2 and G4.The indicator is related to communication to the stakeholders, which is not related to having a shared vision within the board nor to participative decision-making in the board.Therefore, we keep it as a separate indicator in the CFA model directly loading on the second order construct "Governance performance".GOV22 has a high factor loading on G4 and G5.As the indicator "Goals meeting the needs of the stakeholders" is measured using this single item, we decide not to exclude GOV22.Based on content, we decide to add it to the factor G5 "Clear organizational goals" in the CFA model.
In a next step, we use CFA to test a second order model.The results are reported in Figure 6.Specifically, we test whether the model reflecting the results of EFA and reflecting our interpretation based on content are confirmed by CFA.We also added GOV26, measuring the independent directors as a dummy variable.The results reveal an acceptable fit, but GOV26 has a very low factor loading −0.190 (p < 0.01).A possible explanation is that 91% of the organizations in our sample has external directors and as such this indicator is not a distinguishing factor in our sample.Therefore, we decided to remove GOV26 in our analyses.

Overview of Selected Indicators Based on the Results of the Exploratory and Confirmatory Factor Analyses
As result of the exploratory and confirmatory factor analyses, we retain 21 indicators.Table 14 gives an overview of these indicators, for each performance domain.

Discussion and Conclusion
This paper aimed at describing an assessment tool for the organizational performance of social enterprises, as well as reporting on its development and reliability.By developing a set of relevant indicators suitable for external reporting and an internal assessment tool, we answer the calls for a scale development of performance measurement tools that can be implemented in social enterprises.The tool emphasizes the importance of assessment and the use of a valid tool to determine organizational performance in social enterprises.
The results of our study will be discussed in relation to four key notions-robustness, utility, understanding, and relevance-with regard to scale development [91].Robustness refers to the psychometric qualities of the instrument.With regard to this quality aspect, we conducted exploratory and confirmatory analysis to identify relevant indicators and we controlled for the internal consistency (or reliability) of the scales used to measure the indicators.Utility relates to the application of an instrument and the implications of the results: the evaluation tool can be regarded as an adequate tool to integrate multiple values of social enterprises.Thirdly, understanding refers to how we should correctly assess and interpret the construct.Since the evaluation tool is carried out in close cooperation and dialogue with stakeholders (managers of social enterprises and experts in the field of social entrepreneurship), we integrated best practices and knowledge in the field based on different perspectives.With regard to relevance, the application of the assessment tool for measuring social performance broadens our view and stimulates us to think beyond financial performance in social enterprises by placing emphasis on different performance dimensions.We emphasize the rigorous process which we have undertaken in the development of this assessment tool.A profound literature study guarantees content validity of the items.Recommendations of experts and stakeholders-in order to adapt the measures to the context of social enterprises-were taken into account during the development of the assessment tool.A major contribution is the sample size and representativeness of the respondents of the scale used in the final validation of the research.We can also stress the innovation in the methodology in this specific context.
We contribute to the literature surrounding performance measurement in social enterprises as it adds to our understanding of the use of performance measurement systems in social enterprises.Existing literature has shown that no large-scale empirical research has been systematically conducted dealing with performance measurement in social enterprises and that the developed tools are either too general or too specific in their design, having an impact on the practical usefulness of these tools.In order to bridge this gap, we used qualitative and quantitative research methods, incorporating the expertise and knowledge of multiple actors in the field of social entrepreneurship, to develop a performance measurement tool, suitable for a wide range of social enterprises.
Moreover, we add to the literature surrounding performance measurement in social enterprises by contributing to the use of performance indicators and performance measurement in the particular context of social enterprises.We acknowledge the role of diverse performance dimensions in this instrument and we take a more comprehensive view of performance than mere financial organizational performance, proposing a much broader concept that encompasses a variety of performance indicators.Although this holistic approach seems promising, scholars have only recently engaged in conceptual and empirical studies [2].
In addition, our study has implications for practitioners.The performance measurement tool can be used for several reasons.Social enterprises can use the developed performance measurement tool to deal with the tensions they are exposed to because of their hybrid character.First, we provide social enterprises with an internal self-assessment tool.Preferably, the assessment tool is completed by diverse employees of the organization.Their (diverse) opinions may give rise to an internal discussion about the non-financial performance of the organization.This may help social enterprises in preventing mission drift and safeguarding the balancing of social and financial goals in internal decision-making.Secondly, social enterprises can use the set of selected performance indicators as a guideline to report non-financial performance to external stakeholders.As such, they respond to the increasing demand for accountability, necessary to establish legitimacy.
Our paper also has some limitations, which have implications for future research.First, we conducted this study within Flemish social enterprises.It would be useful to replicate and generalize it to examine the validity of the developed tool on a larger, international scale.Furthermore, a promising trajectory for future research is to study the moderating effect of cultural differences in social enterprises between countries.As such, it would be interesting to examine the validity of this evaluation tool on a larger, international scale.Finally, performance management involves setting expectations for future achievements on the indicators which have been selected.The benefits of setting targets are that organizations have a focus and a clarity of organizational goals [20].Future empirical studies could focus on setting targets, and could examine the impact of setting targets on organizational performance.social enterprises.The support of these organizations is gratefully acknowledged.We would like to thank Tine Claeys and Linde Moonen.They were part of the research group and were involved in data gathering.We also would like to thank the two anonymous reviewers for their constructive feedback and helpful suggestions.

Figure 1 .
Figure 1.Overview of the five selected performance domains.

HUM4 48 HUM6
Learning in my organization is seen as a key commodity necessary to guarantee organizational survival Development/ personal growth of employeesHUM5We develop our employees aiming at job rotation within our organization Adaptation based on GRI[27] Our organization supports all employees who want to pursue further education Rettab et al. (2009) [73] Equal opportunities for minorities HUM7 Our organizations has a policy concerning equal rights and non-discrimination OʹConnor& Spangenberg (2008) [74] Involvement of personnel in education and training HUM8 Our organization involves the employees in the planning of education and training Adaptation based on CAF [76] Interaction between employees HUM9 We pay attention to good relationships between our employees Adaptation based on ISO26000 [78] Goal oriented HRM HUM10 Our HR-policy is carefully planned Adaptation based on GRI [27] HUM11 Our HR-policy is carefully evaluated Adaptation based on GRI [27] Job satisfaction HUM12 Our organization pays attention to individual job satisfaction Adaptation based on GRI [27] Diversity management HUM13 Our organization has a policy on diversity management Cuesta Gonzalez et al. (2006) [79] Policy on education and trainingHUM14 Our organization has a policy for the training and development of employees Mishra and Suar (2010) [71] Support on the work floor Adaptation based on Heslin et al.(2006) [80] HUM15* We support our employees in taking on new challenges HUM16 We offer useful suggestions regarding how employees can improve their performance HUM17 We provide constructive feedback to employees regarding areas for improvement HUM18 We help employees to analyze their performance HUM19 We provide guidance regarding performance expectations Work-life balance HUM20 Our organization is successful in balancing paid work and family life Adaptation based on Milkie & Peltola (1999) [81]* Item removed after EFA; All items measured on a 7-point Likert scale.

Figure 5
gives an overview of the results of the adapted model.The fit indices show a good fit: c 2 /df = 2.9; CFI = 0.918; TLI = 0.905; RMSEA = 0.089 and SRMR = 0.058.All factor loadings are significant (p < 0.001).Finally, we checked the reliability of the scales used to measure the four remaining indicators by calculating Cronbach's alpha.The results indicate a strong scale reliability (>0.70):Performance support (α = 0.938), Training & Development (α = 0.907), HR-policy (α = 0.883) and Diversity Management (α = 0.843).The results indicate that the originally selected 12 indicators can be reduced to seven indicators (the four indicators H1, H2, H3 and H4 and the single-item indicators HUM9, HUM12 and HUM20), relevant for measuring the human performance of social enterprises.

Figure 5 .
Figure5.Human Performance: items and item loadings CFA.Standardized item loadings using the WLSMV estimator: P < 0.001 for all loadings; c2/df = 2.9; CFI = 0.918; TLI = 0.905; RMSEA = 0.089; SRMR = 0.058 Gill et al. (2005) [86] GOV16Board members demonstrate clear understanding of the respective roles of the board and CEOPreparedness to learn from mistakesJackson and Holland (1998)[83] GOV17In the board of directors we discuss about what we can learn from a mistake we have made External communication to stakeholders Jackson and Holland (1998)[83]

Table 1 .
Overview of the different phases of the research process.

Table 2 .
Overview of the indicators selected through literature review, focus groups and Delphi panel.

Table 3 .
Overview of the population and sample of the survey.

Table 4 .
Economic performance: overview of indicators, items and scales.Made decisions that created changes in staff stability Measured on an 8-point Likert scale.Based on Helm and Andersson (2010) [70]; * Item removed after EFA.

Table 5 .
Economic Performance: items and item loadings EFA.

Table 6 .
Environmental performance: overview of indicators, items and scales.

Table 7 .
Environmental Performance: items and item loadings EFA.

Table 8 .
Community performance: overview of indicators, items and scales.
* Item removed after CFA; All items measured on a 7-point Likert scale except COM8 (sum of different kind of partnerships) and COM9 (yes/no).

Table 9 .
Community Performance: item and item loadings EFA.

Table 10 .
Human performance: overview of indicators, items and scales.

Table 11 .
Human Performance: items and item loadings EFA.

Table 12 .
Governance performance: overview of indicators, items and scales.

Table 13 .
Governance Performance: items and item loadings EFA.

Table 14 .
Retained indicators after EFA and CFA.