Collaborative Service Innovation: A Quantitative Analysis of Innovation Networks in a Multisectoral Setting

: Partial least squares structural equation modelling has proven very valuable to study the unexplored and complex public service innovation networks (PSINs) in the public sector, from a socio-economic stance. Web have modelled PSINs’ three structural variables—Social, Actors, and Functioning mode—using a sample of original data (n = 233). Our PSINs’ model conﬁrms them as instruments that produce public service innovation—involving technological and nontechnological characteristics. Additionally, we set-up a novel and potentially fruitful methodology to study the intricate formation and impact of complex socio-economical structures that connect innovation and public services. Hence, our research supports a better and extended use of PSINs as a tool for policy and service co-design and co-implementation. And we open a promising line of studies involving multi-actor collaboration in the public sector.


Introduction
The "service innovation studies" [1][2][3][4][5][6][7] have researched, at length, the relationship between innovation and services. This has been most helpful after the recent upsurge of interest and research in the relationship of innovation and public services [8][9][10][11]. We now know that public policy and services are created and implemented using practices that integrate "multilevel, cross-border settings, in which former demarcations of policy fields become blurred" [12].
Analyzing the use of these "multilevel" practices, service innovation studies link linear models and innovation practices-tenders, public-private partnerships, even design thinking-with traditional administration and new public management paradigms [3,4,9,13]. They also link interactive, circular, or networked practices-living labs, public-private innovation networks, hackathons-with today's greater need for public coordination and collaboration. This need is addressed by the new public governance paradigm [9,12,14,15] (the Co-VAL project presents conceptual evidence and more than 50 cases along this line). (Some authors have rather redundantly debated if these are actual paradigms [16]. We adhere here to the neat definition of these governance regimes by Osborne as the "different modes of design and implementation of public policy and delivery of public services". Therefore, Public Administration, the New Public Management (NPM) and the New Public Governance (NPG) are then presented as "policy and implementation regimes.").
Among the networked practices to innovate public services, Gallouj and his colleagues [1,[17][18][19] described the public service innovation networks (PSINs)-multi-agent collaborations that design and implement local, national, or multinational public policies through services. They conceptually featured them: from their morphological to their functional aspects; and from their innovation outputs to the types of innovation they aim for. We found a profusion of lay publications and public sector professional events showing that PSINs are a real and frequently used option for governments and other agents to develop public service innovation [20][21][22][23][24][25][26]. However, the networks to innovate services are still understudied by scholars of public administration and management [17,27].
With this research, we want to study the characteristics of these PSINs, and how PSINs are connected to their outcomes and, in general, the innovation of services. We empirically answer which significant criteria build PSINs, and which their types of innovations and outcomes are. To achieve both goals, we use partial least squares structural equation modelling (PLS-SEM). This is a statistical modeling technique in use across several fields of social research, because it helps connecting theoretical models and data.
The translation of a theoretical model as complex as the multilevel PSINs model [1,28] into a PLS-SEM model demands [29,30]: (1) Analysis of each criterion that play a part in the different conceptual layers of a PSIN and its outcomes. Complementary, the study of PSINs and their relationship with innovation. Here, the integration of technological (product, service, process) and nontechnological (organizational, market, input) innovations [4] requires special attention. With this research, we contribute to the service innovation studies by presenting the constituting criteria of PSINs as a practice for public service innovation. We also make visible the effects of PSINs as innovation sources. Being PSINs a complex concept, due to the multitude of variables and their paths, our methodology makes possible their in-depth assessment. This methodology starts a new way to study the complexity of the structures and practices that governments are developing to cope with new types of coordination with citizens and other stakeholders.
Next in this paper is a review of the theoretical underpinnings of our PSINs model to study PSINs effects on public service innovation. Then, we introduce our methodology and show our findings. We conclude our paper with a brief discussion and conclusions.

The Agents Component of the PSINs Model
PSINs are multi-agent (multi-stakeholder) groups, gathering around a common objective, but with varying numbers, natures, places, roles, and power. The complexity of their study scales up when they group and interact with each other. PSINs literature indicates a potential collinearity between these different characteristics or criteria, and Hair and others [29,39] suggest grouping them into one higher-order component. And we decided to analyze the potential effects of the characteristics of those groups in relationship with the rest with the higher-order Agents component.
We found that the theoretical types of agents, or actors, include manufacturing (industrial and agricultural) companies, public entities, market (commercial) services, individual citizens, and third-sector organizations (NGOs, unions, social groups and enterprises, and associations) [1]. PSINs main actors are citizens, public, and third-sector organizations, and their involvement is often as individuals, rather than as collectives. Thus, we hypothesize: Hypothesis 1a. public entities, citizenry, and NGOs are the main actors of PSINs, being the most committed and engaged with them.
But these actors' respective commitment and engagement are still unclear, especially when coupling a specific PSIN with a target sector or problem. PSINs work on soft, non-R&D intensive, "immaterial, frugal" innovations, which might determine the type and intensity of commitment by sector. Therefore, we hypothesize: Hypothesis 1b. the commitment of the actors to the PSINs is stronger and significantly different in the health sector than in other public subsectors-elderly care, employment, attention to women, children, youth, minorities, excluded groups, mobility, environment, or security.

The Social Component of the PSINs Model
Due to the type of problems and initiatives they deal with, PSINs act on public services, organizations, or processes. The problems addressed by PSINs are often described as wicked due to their imprecise, complex, and potentially harming nature for the society or some of its groups. Also, PSINs initiatives are corrective, progressive, and even creative [28]. Problems and initiatives together produce different types of interventions and projects outside and within the public entities. And they open opportunities for PSINs to deliver different forms of value-e.g., productivity (improved efficiency, returns, or justice and equality), engagement, and learning about needs or problems. Then, and assuming a high correlation between these criteria, we decided to group them together (following [29,39]) in our model's Social component.
Per the types of services that PSINs develop, authors cite general services, social services, and utilities. The types of problems or needs that PSINs address are numerous, and we decided to control for services in the following subsectors: health, aging, education, transportation and mobility, environment and urban, employment, security, women, childhood, youth, and excluded minorities.
The interventions or initiatives and projects developed by PSINs are rather unclear in the literature. Given their nature, these projects are rarely pure R&D, but much more oriented to nontechnological aspects. The PSINs projects are described as combinations of the rationalization or adoption of a product-service-IS and the design of a service, integration of products-services, new forms of delivery, or even new ways of doing things that free from bureaucracy [40,41]. Therefore, we hypothesize: Hypothesis 2. PSINs are strongly related to nontechnological types of projects, or to projects driven by their nontechnological aspects.

The Functioning Mode Component of the PSINs Model
Networks of the type we have described so far are the result of an aggregation of actors. PSINs devote themselves to a multiplicity of aims by means of different types of initiatives or projects, aiming to produce several forms of value. These rather complex and changing entities need a way to organize themselves. Besides, these networks are the result of a planned effort inspired by an entity (public, nonpublic, or individual), although the literature also documents spontaneous networks [26]. To analyze the complexity of the organization of these different, but potentially correlated, criteria, we decided to sum them [29,39] in the Functioning Mode component.
Authors describe two modes of network organization: top-down, or vertical, as in a supply-chain; and fairly horizontal, even bottom-up, where responsibilities and leadership are distributed and leveraged [27,38,42]. Conceptually, PSINs belong to the latter, emerging from planned or spontaneous initiatives. It is unclear though if they organize around just one, or multiple entities. Consequently, we hypothesize: Hypothesis 3a. PSINs are the result of planned initiatives.
Hypothesis 3b. PSINs are organized horizontally, and their responsibilities and leadership are distributed among their partners.

The Outcome and Innovation Components of the PSINs Model
The objective of PSINs is the innovation of public services [28]. But being a collaborative effort to address wicked problems and progressive, even creative, initiatives, PSINs also target innovation of organizations and processes. Consequently, our innovation component gathers the following criteria delimiting the types of innovations sought out by PSINs: (1) Nontechnological (vs. technological) innovation, as PSINs produce soft types of outcomes. (2) Nonsystematic (vs. systematic or step-by-step) innovations, at least until the partners of the network decide on their next steps.
(3) Strategy or policy, mindset, way-of-doing-things (vs. product, service, process) innovations, even though PSINs often must develop products or processes, or new integrations of existing products, to implement their innovations. (4) Unclear (vs. well identified) solutions given the blurred context and undefined nature of the wicked problems PSINs face. (5) Adopted (vs. original) innovation, since many PSINs are born from earlier experiences in other geographies or sectors. (6) Radical (vs. incremental) innovation. This criterion refers as well to the accumulation of several incremental innovations, described here by the other five criteria, rather than the development of disruptive changes or outcomes, in the Schumpeterian sense.
The effect of these criteria, delimiting the types of innovations produced by PSINs, vary along their life cycle (early, mid, mature, end/abandonment stage) [28]. Therefore, we hypothesize: Hypothesis 4. The PSINs' life cycle moderates the type of innovations they target, and PSINs will opt for unclear, adopted, or radical innovations in their earliest stage.
The literature also recognizes that the outcomes sought by PSINs influence the type of innovation pursued. In this regard, networks in general, and PSINs, are primarily considered robust, permanent, and experimented entities, like an established company, or administration. Therefore, they get measured against, and required, conventional type of outcomes [27,35]: number of citizens/users able to access the service, service quality, or costs. Therefore, we theorize: Hypothesis 5. The outcomes of the PSINs moderate the type of innovation it produces, and PSINs opt for adopted, new-ways-of-doing-things type of innovations, when demanded conventional outcomes.  (2) Nonsystematic (vs. systematic or step-by-step) innovations, at least until the partners of the network decide on their next steps. (3) Strategy or policy, mindset, way-of-doing-things (vs. product, service, process) innovations, even though PSINs often must develop products or processes, or new integrations of existing products, to implement their innovations. (4) Unclear (vs. well identified) solutions given the blurred context and undefined nature of the wicked problems PSINs face. (5) Adopted (vs. original) innovation, since many PSINs are born from earlier experiences in other geographies or sectors. (6) Radical (vs. incremental) innovation. This criterion refers as well to the accumulation of several incremental innovations, described here by the other five criteria, rather than the development of disruptive changes or outcomes, in the Schumpeterian sense.
The effect of these criteria, delimiting the types of innovations produced by PSINs, vary along their life cycle (early, mid, mature, end/abandonment stage) [28]. Therefore, we hypothesize: Hypothesis 4. The PSINs' life cycle moderates the type of innovations they target, and PSINs will opt for unclear, adopted, or radical innovations in their earliest stage.
The literature also recognizes that the outcomes sought by PSINs influence the type of innovation pursued. In this regard, networks in general, and PSINs, are primarily considered robust, permanent, and experimented entities, like an established company, or administration. Therefore, they get measured against, and required, conventional type of outcomes [27,35]: number of citizens/users able to access the service, service quality, or costs. Therefore, we theorize: Hypothesis 5. The outcomes of the PSINs moderate the type of innovation it produces, and PSINs opt for adopted, new-ways-of-doing-things type of innovations, when demanded conventional outcomes.  To analyze the complexity of interactions and the effects described in Figure 1, we decided to use partial least squares structural equation modelling (PLS-SEM). It helped address the challenge of representing the more than one hundred variables associated with our components, their interactions and effects, and the validation of our hypotheses. We describe the methodology in the next section. To analyze the complexity of interactions and the effects described in Figure 1, we decided to use partial least squares structural equation modelling (PLS-SEM). It helped address the challenge of representing the more than one hundred variables associated with our components, their interactions and effects, and the validation of our hypotheses. We describe the methodology in the next section.

Conceptual Support for Our Indicators
Our exploratory cross-sectional research [30,43,44] started reviewing the literature for support of every component of PSINs. The review departed from the work of Gallouj and colleagues [17,27] and the Co-VAL research reports. They underpinned our definition of the items of an on-line survey design. We chose this type of design because secondary data supporting the PSINs theoretical background are hardly available. Also, our multivariate model requires answering a long list of questions (114) and, consequently, unsupervised adaptability of the data-gathering tool to allow respondents answering at their own pace. Further, potential respondents' biases require in-experiment anonymity, duplicity, and answers' random presentation controls. Besides, efficient data collection, even with lockdowns and minimal social contact, needs 24 × 7 uptime and improved accessibility by any means or under most conditions. Finally, effective and targeted dissemination of the survey demands easy link-sharing of the survey with participants.
We built the first versions of our survey questionnaire, and we tested them through a series of pilots. 12 selected members of actual PSINs from our reference subsectors-health and several public services-cognitively tested them [45] to assess how well our survey layout performed.

Survey Participants
As recommended in [29,46], we estimated the sample size of 75 data points using Kock and Hadaya's inverse square root method [46], provided the number of indicators included in the research design (refer to the list of indicators in the Appendix A). (The minimum sample size using G*Power [47]  From Figure 2: after setting up our hypotheses and concept model, we validated it with our case studies. Then, we designed our questionnaire in English, translated it into Spanish, and confirmed language and content with six experts and 12 cognitive tests (The list of experts is available from the authors upon request.) with key informants. These interviews allowed us testing a first version of the questionnaire in late March 2020 for online access, introduction text, and categorical scales. The prototype also helped us rework the layout and final wording of the questionnaire to limit the time spent in each section. From early April to late June 2020, we administered the final on-line survey (Limesurvey, V. 2.73.1).
In Spain, emulating the rest of Europe, the public and third sector have since long trained their employees and managers to use social networks (e.g., the 2021 training plan of the Madrid municipality). This type of training is part of the public sector digital transformation and open-government initiatives. Then, we screened social network sites, where public servants, managers and employees of NGOs, and other potential participants in PSINs (e.g., unemployed people or families with disabled people) publicly declared their jobs in any of our target sectors. We studied each of the digital profiles that we found to validate their professional roles and affiliation to a service innovation network, presently or in the past.
Our screening process identified 2791 profiles meeting our design criteria. We sent out a first short message to each of them, presenting our scientific research, and invitation to accept us among their contacts and participate in the research. From them, 1034 responded to the message, ensuring that real people authenticated the digital profiles. Their acceptance also opened other means (email messaging) of contacting participants. Next, we randomly selected the final population of the research (N = 565), and administered the survey using the social networks' messaging platforms. A second reminder was emailed with a fiveweek interval. Finally, we obtained 233 completed responses, which yielded a 41.24% response rate consistent with comparable cross-sectional studies (e.g., [48]). Nonresponse bias was deemed inexistent by a wave analysis (linear extrapolation method: [49,50]), involving education, gender, age, and profession.
Most of our respondents are females (71%), aged evenly between 26-45 and 46-65 (50% each range). We have also a fair distribution of participants per their education level (below/higher education: 49%/51%) (refer to Table A1 in the Appendix A for different segmentations of our sample's population).
The selection of Spain as a region for our study is a continuation of the research we led for the Co-VAL project. Additionally, Spain is a federal country, with different layers of governments organized by geographical scope and historical reasons. The different types of networks, created to link overlapping governments and their interactions with NGOs and citizenry, make the country an ideal location for our type of exploratory research (refer to [51][52][53]). Besides, time and external reasons (first wave of the COVID-19 pandemic) made it advisable to restrict our geographical scope.

Measurement
For our model analysis, we decided to use PLS-SEM for its "ability to [create] independent latent variables directly on the basis of cross-products involving the response variable(s)" [30]. Henseler and his colleagues [54] recommended PLS path modeling "in an early stage of theoretical development in order to test and validate exploratory models" and soundness in the face of many indicators and aggregations in higher-order components.
Following the systematic application of the nonparametric criteria of PLS-SEM [29] (p. 96), we describe now the PSIN model's assessment process. From Figure 2, after setting up our hypotheses and concept model, we validated it with our case studies. Then, we designed our questionnaire in English, translated it into Spanish, and confirmed language and content with six experts and 12 cognitive tests. These interviews allowed us testing a first version of the questionnaire in late March 2020 for on-line access, introduction text, and categorical scales. The prototype also helped us rework the layout and final wording of the questionnaire to limit the time spent in each section. From early April to late June 2020, we administered the final on-line survey (Limesurvey, V. 2.73.1).  41.24% response rate consistent with comparable cross-sectional studies (e.g., [48]). Nonresponse bias was deemed inexistent by a wave analysis (linear extrapolation method: [49,50]), involving education, gender, age, and profession. Most of our respondents are females (71%), aged evenly between 26-45 and 46-65 (50% each range). We have also a fair distribution of participants per their education level (below/higher education: 49%/51%) (refer to Table A1 in the Appendix A for different segmentations of our sample's population).
The selection of Spain as a region for our study is a continuation of the research we led for the Co-VAL project. Additionally, Spain is a federal country, with different layers of governments organized by geographical scope and historical reasons. The different types of networks, created to link overlapping governments and their interactions with NGOs and citizenry, make the country an ideal location for our type of exploratory research (refer to [51][52][53]). Besides, time and external reasons (first wave of the COVID-19 pandemic) made it advisable to restrict our geographical scope.

Measurement
For our model analysis, we decided to use PLS-SEM for its "ability to [create] independent latent variables directly on the basis of cross-products involving the response variable(s)" [30]. Henseler and his colleagues [54] recommended PLS path modeling "in an early stage of theoretical development in order to test and validate exploratory models" and soundness in the face of many indicators and aggregations in higher-order components.
Following the systematic application of the nonparametric criteria of PLS-SEM [29] (p. 96), we describe now the PSIN model's assessment process. From Figure 2, after setting up our hypotheses and concept model, we validated it with our case studies. Then, we designed our questionnaire in English, translated it into Spanish, and confirmed language and content with six experts and 12 cognitive tests. These interviews allowed us testing a first version of the questionnaire in late March 2020 for on-line access, introduction text, and categorical scales. The prototype also helped us rework the layout and final wording of the questionnaire to limit the time spent in each section. From early April to late June 2020, we administered the final on-line survey (Limesurvey, V. 2.73.1).

Measurement Modes of PSINs Criteria
Our theoretical framework underpins our structural criteria ("weighted composites" [29,30,54]). Since this might be the first analysis of these composites, little is known about their measurement specifications. As Jarvis and colleagues [56] indicated, we acknowledge the threat of misspecification of the measurement of these composites. Modelling our composites as reflective (mode A) variables, when they should have been formatively (mode B) built, can derive in biased results. (Mode B indicators are usually uncorrelated and produce lower outer loadings when misspecified as mode A.) There is, then, a high chance of dropping mode B indicators [56], if they fail to reach the threshold level of 0.4, when they should have been retained. This might impoverish the description of each composite, compromise their content validity, and hinder detection of biases.
To confirm the conceptual, theoretical, and empirical validity of our composite criteria, avoiding measurement model misspecification, we followed the guidelines suggested in Hair and colleagues [29] and Jarvis and colleagues [56]. After theoretically confirming the indicators and potential measurement mode of each criterion, we designed our survey questions following each mode.
Additionally, PLS-SEM provided us with the confirmatory tetrad analysis in PLS-SEM (CTA-SEM) [57]. This is a statistical test that eases the identification of a latent criterion based on the concept of tetrads (τ), describing the connection between pairs of covariances. (A tetrad is the difference of the product of one pair of covariances and the product of another pair of covariances of the indicators of a latent variable.) If all the tetrads of a latent variable are zero, the variable should be measured as reflective [29] (p. 280). And if only one of the (non-redundant) tetrads is significantly different from zero, then the variable must be considered formative. We followed CTA-SEM systematic process (5 steps) for validating the measurement mode of each of our constructs. With our sample data, we changed the measurement of eight latent variables (Collaboration, Engagement, Functioning-mode, Motivation, Type-project, Types, Outcome, and Wicked).

Criteria, Manifest Indicators, and Hierarchical Component Modelling
Our PSINs model has two types of variables, like any PLS model: criteria, or composite variables; and items, or manifest indicators. Criteria are linear combinations of the items that we, based on the theoretical references and case studies, chose to study. In our model, these criteria create the first, second, and third layer of composites, hierarchically ordered. We opted for a hierarchical component model (HCM) [39] because it reduced the number of relationships in the structural model and made the PLS model easier to visualize and more precise (parsimonious). Figure 3 shows our model's lower-order components (LOCs), which capture the theoretical dimensions of the higher-order components (HOCs). The layer of first-order LOCs forms Social and Actors (as theoretically indicated by [17,28]). And Social and Actors combine with Functioning-mode to form PSINs [17,28], which is our third-order HOC.
To combine a first-order component (Functioning Mode) with second-order components (Social and Actors), we must analyze the relationships of the HOCs with their forming LOCs, according to their modes of measurement. In our model, the relationships of the HOCs with their LOCs are formative in all cases. Additionally, the relationships of the first-order LOCs with their items are either reflective (mode A) or formative (mode B). Consequently, our hierarchical component model is of the Reflective-Formative and Formative-Formative type. In this scenario, where the LOCs explain almost all the variances of the HOCs (Social R 2 = 0.998; Actors R 2 = 0.995; PSINs R 2 = 0.976), Hair and colleagues [29] suggested a two-stage HCM analysis: (1) We need to follow the repeated indicators approach to, first, identify the linear relationships (paths) of the indicators and their composite criteria ( Figure 3). Each LOC criterion is the result of a linear combination of its indicators (x 1 , . . . x n ) and their weights (w 1 , . . . w n ), or: similarly, the HOCs are linear combinations of all the indicators from its forming LOCs: Second, we must obtain the latent variable scores (LVS) of our LOCs and HOCssingle-item measures calculated from the multi-item measurement of each criterion, assuming equal weights for each item ( Figure 4). (2) We calculate the path scores between the LOCs and the Social and Actors HOCs, using the PLS-SEM algorithm over the new model ( Figure 4) built with the LVS. (3) We need to repeat the two-stage process described in steps 1 and 2 for third-order component (PSINs) ( Figure 5). The LVS also allow us to analyze the relationship between the PSINs and the endogenous Outcome, and the combined effect of PSINs and Life Cycle on Innovation.
Mathematics 2021, 9, x FOR PEER REVIEW 9 of 27 similarly, the HOCs are linear combinations of all the indicators from its forming LOCs: Second, we must obtain the latent variable scores (LVS) of our LOCs and HOCssingle-item measures calculated from the multi-item measurement of each criterion, assuming equal weights for each item ( Figure 4).
(2) We calculate the path scores between the LOCs and the Social and Actors HOCs, using the PLS-SEM algorithm over the new model ( Figure 4) built with the LVS. (3) We need to repeat the two-stage process described in steps 1 and 2 for third-order component (PSINs) ( Figure 5). The LVS also allow us to analyze the relationship between the PSINs and the endogenous Outcome, and the combined effect of PSINs and Life Cycle on Innovation.

Control for Common Method Variance
Our behavioral manifest items helped respondents self-report on perceptions, or behaviors. This may lead to common method variance (CMV), which could affect the validity of our conclusions [58]. CMV is the systematic variance shared among variables, and it can bias the measures by the method of measurement rather than by the theoretical constructs represented by the measures. CMV potentially threatens the validity of the PLS-SEM conclusions [58,59]. Specifically, CMV represents "the amount of spurious correlation among the variables that may be generated by utilizing the same method (i.e., a survey) in order to measure each [dependent or independent] variable" [58]. Podsakoff and colleagues identified four common sources of CMV [56]: the use of the same respondent for data of dependent and independent criteria; the items' presentation to respondents; the place of the items in a survey; and the contextual impacts (outcomes, innovation types) that measure the criteria [60]. (e.g., if the impacts of the different motivations to participate in a PSIN are estimated only by the perceptions of the individuals on their own motivations and the outcomes of a PSIN, the estimated impact may be biased. This happens if some respondents overstate their motivation and the outcomes due to the tendency to assess themselves in too positive a manner. Also, because of social desirability. If these biases are present, they could produce a (false) positive correlation between motivation and outcomes when the same respondent is used as the single source of the measures for both the independent and dependent variables.) To control for CMV, we followed the Measured Latent Marker Variable ( [61], p. 146) recommendations for mixed controls. Specifically, we used the seven unrelated items drawn from the X1 version of the Malowe-Crowne Social Desirability Scale of Fisher and Fick [61,62], measured with the questionnaire; different formats of response (e.g., randomly presenting the items for all of the constructs); a general positive style of the questions, mixed with negative for some items; and anonymity of participants. We then followed the Construct Level Correction (CLC) approach [61,63] over our HOC model. We constructed four markers (SOCDES_criteria) with the seven social desirability items of our survey ( Figure 6). Next, we re-modeled our original criteria including each marker variable, estimated the paths, and compared these CLC path coefficients and coefficients of determination with the original (refer to Table A2 in the Appendix A). We, lastly, tested the equality of their variances with the Levene's test [64,65] and failed to reject the null hypothesis that the variances of the CLC estimations and the original were equal-p-value of the variances differences was 0.251. Consequently, we confirmed the lack of CMV in our model.

Control for Common Method Variance
Our behavioral manifest items helped respondents self-report on perceptions, or behaviors. This may lead to common method variance (CMV), which could affect the validity of our conclusions [58]. CMV is the systematic variance shared among variables, and it can bias the measures by the method of measurement rather than by the theoretical constructs represented by the measures. CMV potentially threatens the validity of the PLS-SEM conclusions [58,59]. Specifically, CMV represents "the amount of spurious correlation among the variables that may be generated by utilizing the same method (i.e., a survey) in order to measure each [dependent or independent] variable" [58]. Podsakoff and colleagues identified four common sources of CMV [56]: the use of the same respondent for data of dependent and independent criteria; the items' presentation to respondents; the place of the items in a survey; and the contextual impacts (outcomes, innovation types) that measure the criteria [60]. (e.g., if the impacts of the different motivations to participate in a PSIN are estimated only by the perceptions of the individuals on their own motivations and the outcomes of a PSIN, the estimated impact may be biased. This happens if some respondents overstate their motivation and the outcomes due to the tendency to assess themselves in too positive a manner. Also, because of social desirability. If these biases are present, they could produce a (false) positive correlation between motivation and outcomes when the same respondent is used as the single source of the measures for both the independent and dependent variables.) To control for CMV, we followed the Measured Latent Marker Variable ( [61], p. 146) recommendations for mixed controls. Specifically, we used the seven unrelated items drawn from the X1 version of the Malowe-Crowne Social Desirability Scale of Fisher and Fick [61,62], measured with the questionnaire; different formats of response (e.g., randomly presenting the items for all of the constructs); a general positive style of the questions, mixed with negative for some items; and anonymity of participants. We then followed the Construct Level Correction (CLC) approach [61,63] over our HOC model. We constructed four markers (SOCDES_criteria) with the seven social desirability items of our survey ( Figure 6). Next, we re-modeled our original criteria including each marker variable, estimated the paths, and compared these CLC path coefficients and coefficients of determination with the original (refer to Table A2 in the Appendix A). We, lastly, tested the equality of their variances with the Levene's test [64,65] and failed to reject the null hypothesis that the variances of the CLC estimations and the original were equal-p-value of the variances differences was 0.251. Consequently, we confirmed the lack of CMV in our model.

Measurement Model Assessment of Reflective Criteria (Mode A)
Using the factor weighting scheme in the PLS algorithm and bootstrapping [39] included in Smart-PLS 3 (V.3.3.3) software [66], we created the scales for each first-order reflective criterion and assessed their reliability and validity.
First, we assessed the reflective criteria's internal consistency reliability-the extent to which the indicator variables of a criterion are measuring different phenomena and are avoiding semantic redundancies, or overlaps, that could prevent them from being valid measures of the criterion. We used the constructs' Cronbach's alpha and composite reliability [67] scores. We report both, and the Dijkstra and Henseler's rho_A (refer to Table A3 in the Appendix A).
The Cronbach's alpha score assumes that all items have equal covariances and form a unidimensional set, and tends to underestimate the internal consistency reliability if the sample size is small [29,68]. The composite reliability score (Joreskög's rho_c [68][69][70]) assesses the different outer loadings of the items and tends to overestimate the internal consistency reliability [68]. A third score, Dijkstra and Henseler's rho_A [68] does the same as rho_c for the items' weights, and prioritizes them by their individual reliability [30].
As Cronbach's alpha might be too conservative, and composite reliability might result in higher reliability estimates, it is safe to say that the true reliability of our model's reflective criteria lies between both (suggested threshold values of 0.6 and 0.9) [29,67]-rho_A is in that middle ground [68]. In our model, Relevance and, particularly, Measurement Cronbach's alpha scores might lie a bit lower than expected, but still suitable for an exploratory analysis, provided their sound composite reliability [54] and content validity.
Second, we evaluated mode A criteria's validity by examining two validity subtypes: convergent and discriminant validity. We assessed convergent validity-the positive correlation of an indicator variable with other indicators of the same construct-with the outer loadings of the indicator variables and the average variance extracted (AVE). We ensured the indicators' reliability-size of their loadings-selecting those with statistical significance and loadings higher than 0.7 [54].
AVE [71], equivalent to communality, is a complementary measure explaining how much variance of its indicators a construct explains-an AVE score higher than 0.5 indicates that the construct satisfactorily explains more than 50% of the average variance of its indicators. The AVE scores of our mode A criteria explained at least 50% of the variance of their indicators [29]. Our criteria demonstrate the unidimensionality of their indicators, as they converge and represent the same underlying construct by sharing a high proportion of their variance [54].
To keep content validity of our model close to the theoretical framework, we adopted the strategy of retaining as many indicators as possible. Thus, we also kept all indicators with loadings between 0.4 and 0.7 that raised the composite's reliability and AVE scores above the suggested threshold values [64] (p. 103)-refer to examples of Relevance and Innovation in Table A1 in the Appendix A-and dropped the rest.
Reflective constructs must also show discriminant validity-exhibit meaningful differences among criteria-and the joint set of indicators is expected not to be unidimensional. This is especially relevant for our hierarchical component model. Researchers have assessed discriminant validity with the cross-loadings (correlations) of the indicators, differentiating the indicators. Another common assessment is the Fornell-Larcker criterion, to ensure that a construct shares more variance with its associated indicators than with any other construct. Henseler and colleagues [72] demonstrated the poor performance of both measurements when two constructs are perfectly correlated, or when the indicators vary only slightly. Consequently, we decided to assess discriminant validity with the heterotrait-monotrait ratio (HTMT) of our criteria [72]-this ration is the estimate of what the true correlation of the indicators of two criteria would be if they were perfectly measured or perfectly reliable. A correlation under 0.9 (or the more conservative 0.85), and a confidence interval excluding the value 1, signal discriminant validity [28]. Our model's reflective constructs have discriminant validity below 0.85 and their confidence intervals exclude the value 1, at 5% significance level (refer to Table A1 in the Appendix A).
As expected, given the validity values of the criteria, the final 13 reflective indicators exhibit safe variance inflated factors (VIF) and no collinearity issues.

Measurement Model Assessment of Formative Criteria
As suggested by Hair and colleagues [29,44,73], the evaluation of internal consistency reliability and validity assessment of formative constructs imitating reflective constructs, in PLS, is inappropriate. The reason is that indicator variables are likely independent causes and correlate but slightly, and are supposed error free [74,75]. Thus, to assess the convergent validity of formatively built (mode B) criteria, first we aimed at exhausting each criterion domain with their set of indicators, as instructed by the underpinning literature and qualitative cases. Each set of indicators should meaningfully establish the content validity of its related criterion, capturing its theoretical features. Then, we correlated each mode B criterion with its reflective (mode A) measures. This redundancy analysis [67] should produce path coefficients between the mode B construct and its associated mode A construct of over 0.7, or an R-square of over 0.5. Our model's formative constructs paths with their respective reflective constructs show scores of 1 and R-squares of 1, correlating perfectly, and therefore, partially validating the consistent reliability of our constructs (refer to Table A4 in the Appendix A).
Next, to ensure full consistent reliability, the formative indicators, being independent features of the same construct, must avoid multicollinearity issues that could jeopardize their interpretation-two (or more) indicators would collineate when they have the same information, therefore correlate perfectly. We used the VIFs of our indicators to perform this check, which is especially adequate for small-size samples where collinearity boosts the standard errors reducing the estimation of the indicators' weights and signs [29]. The suggested threshold for VIF in PLS-SEM is 5 or lower [29]. Our model's formative constructs exhibit VIFs lower than 5 (refer to Table A4 in the Appendix A).
To end our validity checks of the formative indicators, we assessed their relative contribution to each construct. In PLS-SEM, formative indicators combine linearly to fully form and explain their construct or composite. These indicators are linear, formative indexes of each composite, and together they give meaning to it-these composites are different from the causes of the construct in that they fully explain it, not produce it. They make unnecessary the calculation of an error term that captures the rest of causes of the construct excluded from the model [74]. Thus, we retained all indicators that were significant at the 10% level, following our strategy to keep content validity closer to the theoretical underpinnings. Also, we opted to retain nonsignificant indicators, with a high absolute contribution (importance)-outer loading above 0.5 (Table A4, in the Appendix A) and VIFs signaling enough distance with other indicators of the same construct [29,76]. We finally selected 25 composite indicators.

Structural Model Assessment
After ensuring the reliability and validity of our mode A and B criteria with a firstorder PLS model, we must verify our structural model. We ought to validate the relationships (collinearity) between the constructs, and the model's predictive strength [29]. But first, in our HCM model, we needed to run a second two-stage HCM analysis to properly assess the paths between the second-and third-order constructs (Figure 4). From this second run, we dropped Feeling, Intensity, and Relationship as criteria with nonsignificant paths for Actors.
Further on, our model's estimation and significance would be compromised if the constructs would show high levels of structural correlation. We verified the inexistent constructs collinearity with their VIFs (Table 1).
To assess the predictive capabilities of the model, PLS-SEM differs from other covariancebased techniques. Its estimates maximize the explained variance of the endogenous latent variables-Actors, Social, PSINs, Innovation, Life-cycle, and Outcome. Then, goodness-offit is not immediately translated into PLS-SEM, and we needed to validate the goodness of our hierarchical component model fit with other tests [77]. Namely, we used the significance of the constructs path coefficients, the R-square values, the f-square sizes, the predictive relevance Q-square, and the q-square effect size (Table 1). Notes: Significance: *** p < 0.001, ** p < 0.05. F-square values: ( + ) 0.02, ( ++ ) 0.15, and ( +++ ) 0.35, indicate small, medium, and large effects [78]. R-square values: 0.25, 0.50, and 0.75 indicate weak, moderate, and substantial predictive power [28,54]. Q-square values larger than 0 suggest that the model has predictive relevance of that construct [67,79,80]. q-square effect sizes: 0.02, 0.15, and 0.35 indicate small, medium, or large predictive relevance [67,78].ˆnonsignificant indicators with loadings higher than 0.5 and theoretically valid.
Although our model shows predictive relevance for all constructs-positive Q-squareits coefficient of determination for the main dependent variable, Innovation, is weak (14.7%).

Moderation Effects
We hypothesized (H1b) the potential difference of PSINs targeting health issues from other public problems (e.g., aging, education, employment, or exclusion). Our objective with the moderation assessment was to verify if the observed differences produced by the groups associated to our moderating variable (MODORG02[MO09]) were significant. We decided to test its effect using Henseler's and colleagues PLS-MGA nonparametric multigroup analysis [54].
Before performing the PLS-MGA, we should ensure measurement invariance-the group differences should be related to variations in structural relationships, leaving aside differences from content or the groups' meaning of the constructs. Demonstrated measurement invariance supports the conclusions and validity of multigroup comparisons.
We assessed measurement invariance using the MICOM procedure [72], analyzing configural invariance, compositional invariance, and equality of the means and variances across the groups produced by our moderating variable. This procedure successfully established full measurement invariance for the health sector (refer to Table A5 in the Appendix A). This means that we can present the pooled results of our sample, which increase their generalizability [72]. Our model works significatively better predicting Outcome of Health-related PSINs (R 2 = 0.712) and is weak predicting its Innovation (R 2 = 0.336). In summary, and from the results presented in Table A5, in the Appendix A, the groups originated after the application of our hypotheses to the pooled sample improve the overall coefficients of determination of the model.

Unobserved Heterogeneity
Since our research is one of the few of its kind up to this date, we decided to study other sources of heterogeneity that went unpredicted by our references, or that lied hidden in our sample. Several authors have claimed for exposing (observed or unobserved) heterogeneity [29,72,81], supporting the validity of the PLS-SEM results, and generalizability of our model [81].
To assess unidentified heterogeneity-and make visible new moderators or hidden combinations of contextual variables-in a model that has reflective and formative variables, we followed Becker and colleagues Unobserved Heterogeneity Discovery (UHD) process [81] and used their PLS-Prediction-Oriented Segmentation (PLS-POS). This nonparametric technique improves the capability of handling formative constructs. And it uncovered several groups from our sample. In Table A6, in the Appendix A, we present the results of the UHD for two hidden segments that also serve as representatives of the potential strength of our PSIN model. They optimize our main endogenous variable (Innovation), which is significantly different for both segments, and increment the model's predictive power from weak to moderate for segment 1 (and close to a moderate for segment 2).

Findings
PLS-SEM has proven very valuable to study the unexplored and complex innovation networks in the public sector, from a socio-economic stand. Thanks to its rigor, we first have been able to model this complexity with three layers of variables, or aggregated components. They reduce the theoretical relationships between the many indicators of PSINs (refer to Table A7 in the Appendix A), their activities, and outcomes. Then, we have statistically assessed these components' modes of measurement, unexplained by earlier conceptual references. After having validated the measurement and structural models, we have analyzed the measurement invariance of our model, comparing observed and unobserved groups. And we have confirmed that our sample respondents agreed on their interpretation of our constructs. Our model constructs then measure what they intend, and the respondents agree on their meaning.
Although the relevant literature in this novel field provides extensive lists of potential indicators driving the shapes and formation of these networks-we started with 114 itemsour research shows that only 38 of these theoretical indicators are relevant. We have constructed the Social, Actors, and Functioning Mode composites with them. Together they are the three significant criteria, or dimensions, of the PSINs concept. And Social positively influences PSINs the most.
The Social criterion is built of indicators describing that (1) PSINs engage citizenry and users as their partners, assess their satisfaction pre-and post-innovation, and use market research techniques to engage and evaluate users. (2) PSINs develop projects to design (not deliver) services, to free from bureaucracy, and to integrate products in services-H2 is then confirmed. (3) PSINs prefer measures of innovations related to productivity, efficiency, units produced, and costs, returns, revenue, or value added. And (4) PSINs are more relevant in Health, Security, and Attention to women, minorities, and excluded populations subsectors.
The Actors criterion is weaker than expected, but still mixes several influences. (1) PSINs collaborate with citizens and users through their representatives (NGOs, associations) in co-production and co-implementation of services, and in the analysis of data about their experiences. But leave citizens out in idea generation and prototyping sessions. To generate ideas and prototype services, PSINs include consultants and technical staff. (2) Respondents declared that citizens participation is the most relevant, followed by NGOs, associations, unions, and industrial or agricultural companies. (3) From the public sector, only participants working in units or departments that foster PSINs to create social innovation are motivated to participate in them. And (4), citizens are the only truly committed actors to their PSINs-H1a is then partially confirmed. And universities and research institutions seem to either react negatively or have a low commitment to their PSINs. All other types of actors either did not engage in PSINs, or their commitment was not relevant for the pooled sample. Our results show a nonsignificant difference in the Actors-PSINs path for Health and nonhealth sectors-H1b is not confirmed.
Finally, their Functioning Mode also shape PSINs. Frequently, contracts regulate the relationships between the PSINs actors. However, trust, according to our respondents, is more meaningful than bureaucracy-H3a and H3b are not confirmed.
The improvement of employee satisfaction and working conditions is the most relevant PSINs' outcome. Other significant outcomes are larger number of citizens able to use the service, improved user experience, shorter design and implementation time, and better service quality. H5, the effect of Outcome in the type of Innovations developed in the public sector, is irrelevant for the pooled sample and for the Health-nonhealth segments. But it is significant for the PLS-POS groups ( Table 2): Outcomes partially suppresses the PSINs-Innovation relationship for Segment 1, and partially enhances it for Segment 2-hypothesis H5 is partially confirmed. The Innovation component of our model includes four types of innovation produced by the PSINs. The changes produced in how people usually think stand out. Other meaningful effects are the changes of concepts and ideas, the changes of the organization or group of people, and the changes of strategy and policy. And, although we can confirm the relationship of the life cycle with the innovation type, we reject H4 for the unmeaningful PSIN-Lifecycle path.

Discussion and Conclusions
With our research, we have successfully described public service innovation networks (PSINs) as instruments related to public service innovation-involving soft and hard elements-and specific outcomes. With the help of the PLS-SEM method, we have set-up a novel and potentially fruitful theoretical approach deepening the intricate formation and impact of PSINs. This approach identifies Social, Actors, and Functional Mode as the criteria of PSINs, aggregating network morphological and functional indicators. With our model and framework, we are contributing to the "service innovation studies" by empirically confirming the link between innovation and public services. Additionally, we support a better and extended use of PSINs as a tool for policy and service co-design and co-implementation [81,82].
Our model confirms the positive-even moderate-effect of PSINs as producers of public service innovation. The types of innovation produced by the public sector are complex initiatives beyond a sole product, service, organization, or process. Our results, first, confirm that PSINs relate to public innovation initiatives. This connection supports the evidence of the positive correlations between service and innovation [1][2][3][4][5][6][7], and between interactive forms of governance and public sector innovation [83]. Second, PSINs innovation types are mainly about how people think, new concepts and ideas, organization changes, and strategy and policy alternatives, which confirm their categories described by earlier academic references [1,35].
Complementarily, although PSINs are collaborative and interactive by design, they are connected to the improvement of outcomes, based on optimization rather than cocreation, associated with conventional (top-down) innovation of public services: Improved employee satisfaction and working conditions, more citizens able to use the service, better user experience, shorter design and implementation time, and better service quality. Thus, we have found a disconnection (nonsignificant path) of these outcomes with the innovation types we just described above, which speaks of the complementarity of the strategies for public service innovation [83] in practice, rather than the separation or conflict between them; public managers tend to homogenize the types of outcomes their entities produce, independent of the innovation practices (e.g., PSINs) or types [9,35].
PSINs are motivated by social criteria like the desire to effectively engage citizenry and the measurement of their satisfaction. They are primarily used in the health, security, and attention to women, minorities, or excluded populations sectors. And the metrics of PSINs social impact are, like outcomes, rather conventional: productivity, efficiency, and units produced, or costs, returns, revenue, and value added. With this study, we provide sound evidence of the strength of the social criteria in the formation of the PSINs, even beyond what the literature theorized [28,40,41].
The relevant actors in these PSINs are somehow unexpected. Our pooled sample validated the citizens followed by NGOs, associations, unions, and industrial or agricultural companies as significantly committed to the PSINs goals and roles. But the collaboration with these actors is through representatives of users and citizens, which is a novelty, and consultants. Our evidence then speaks of significant commitment from a limited number of actors, beyond the mere participation of a larger number of them [1]. And this might be crucial to understand the efficacy and the rest of the outcomes of PSINs and their role in the development of new public services.
Finally, the relationships between actors in PSINs are contractual, which support their outcomes and types of innovation. We have also confirmed that, beyond contracts, partners' relationships are based on trust and more horizontal than other, more conventional, practices [27,38,42]. Influenced by the zeal to reach measurable outcomes and transparency, our respondents declared that this trust is explicitly formalized in contracts among the network partners.
This is a complex socio-economic study using the PLS-SEM method. We have analyzed over a hundred items, and have constructed different criteria, or composites, with them. Put simply, there were no indications in the literature of how we could measure, aggregate, or even relate, these criteria. First, we learnt the mode of measurement of each criterion-reflective or formative-and built a valid and reliable model with them. Then, we validated that our sample groups agreed with us in the interpretation of the endogenous (dependent) criteria, since different interpretations would have risked the generalizability of our conclusions. Finally, we ensured our model had a moderate predictive strength and allowed us to confirm the effect of PSINs on the innovation of public services.
A word of caution is also required at this point. Although our results are relevant in the context of several Spanish public subsectors, including some of the most relevant, we encourage further research along other subsectors to generalize our conclusions. It is also evident the potential effect of our limited geographical scope, and we would like to extend an invitation to other researchers to apply our approach to other regions. Additionally, we acknowledge the potential bias of self-adscription of our participants to our survey. However, we believe we have controlled for its impact through pre-and postadministration of the survey controls. They also allowed us to control for CMV, including, but not limited to, the oversized number of respondents and the PLS-SEM measurement tests that ensured the soundness of our data.
PSINs are a relevant instrument for developing new ideas and facing complex projects with the aim of innovating services to address today's societal problems. Being at this early research stage of this collaborative practice, we believe we have set a sound ground to leverage future research using our framework and model. Lastly, our PLS-path results can even serve as initiators of other analyses that can help understand the dynamics of how PSINs produce their outcomes and innovations, maybe using agent-based models and other simulations. Funding: This paper has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 770356: Co-VAL. This publication reflects the views only of the authors, and the Agency cannot be held responsible for any use, which may be made of the information contained therein. The paper has also been co-funded by the Spanish National Research Programme RTI2018-101473-B-100.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Our sample population, by geographical scope, type of organization and subsector.

Participants' entities
Non-religious private agency, foundation, association or entity 95 41% Not working in any or working autonomously 10 4% Public administration, agency or entity 121 52% Religious foundation, association or entity 4 2% Union 3 1%        In that group you are describing, there was . . . contracts formalized the arrangements between agents Likert-5

FUNCTI04
The role of the main public agent in that group was . . . proponent or central authority of the project; second to a proposing non-public agent, but actively supporting and facilitating the project; passively supporting private agents; no public agents Deepening in the goals of that group you are describing, you and the rest of its members aimed for . . . the design of a public service Likert-5

MODORG01[MO02]
Deepening in the goals of that group you are describing, you and the rest of its members aimed for . . . the delivery of a public service Likert-5

MODORG01[MO03]
Deepening in the goals of that group you are describing, you and the rest of its members aimed for . . . a private product or service Likert-5

MODORG01[MO04]
Deepening in the goals of that group you are describing, you and the rest of its members aimed for . . . the rationalization of a process (e.g., of production) Likert-5

MODORG01[MO05]
Deepening in the goals of that group you are describing, you and the rest of its members aimed for . . . the adoption of a technical system or a process Likert-5

MODORG01[MO06]
Deepening in the goals of that group you are describing, you and the rest of its members aimed for . . . new paths to achieve the group's goals, free from the established or bureaucratic procedures Likert-5

MODORG01[MO07]
Deepening in the goals of that group you are describing, you and the rest of its members aimed for . . . the integration of products in services Likert-5