Collective Efficacy: Development and Validation of a Measurement Scale for Use in Public Health and Development Programmes

Impact evaluations of water, sanitation, and hygiene interventions have demonstrated lower than expected health gains, in some cases due to low uptake and sustained adoption of interventions at a community level. These findings represent common challenges for public health and development programmes relying on collective action. One possible explanation may be low collective efficacy (CE)—perceptions regarding a group’s ability to execute actions related to a common goal. The purpose of this study was to develop and validate a metric to assess factors related to CE. We conducted this research within a cluster-randomised sanitation and hygiene trial in Amhara, Ethiopia. Exploratory and confirmatory factor analyses were carried out to examine underlying structures of CE for men and women in rural Ethiopia. We produced three CE scales: one each for men and women that allow for examinations of gender-specific mechanisms through which CE operates, and one 26-item CE scale that can be used across genders. All scales demonstrated high construct validity. CE factor scores were significantly higher for men than women, even among household-level male-female dyads. These CE scales will allow implementers to better design and target community-level interventions, and examine the role of CE in the effectiveness of community-based programming.

Items with factor loadings less than this threshold measured the latent factors poorly, and were eliminated in a step-wise manner [3]. It is worth noting that no broadly accepted guidelines exist for saliency of factor loadings, but pattern coefficients in the range of 0.300-0.400 are often interpreted by analysts as salient in applied research [4]. We defined complex variables as items with factor loadings of |0.300| on more than one factor [4].

Justification of factor extraction approach
Factor retention was not solely based on the Kaiser-Guttman rule (i.e., eigenvalue >1.0 [5]), but also considered heuristic descriptive guides (i.e., scree-plot), goodness-of-fit, and other substantive justification (e.g., results from cognitive interviews, theory and other evidence). The last factor extracted for the men's CE model had an eigenvalue of 1.118; the first factor not retained had an eigenvalue of 1.029. The eigenvalue for the last factor extracted for the women's CE model was 1.336; the first factor not retained had an eigenvalue of 1.068. While the first factor not retained for both men's and women's CE measurement models had values above 1.0 threshold, the retention of those factors was not warranted by strong substantive or statistical justification [4,6,7]. Including those factors merely because their eigenvalues were slightly greater than 1.0 would reflect the sole use of a mathematically-based descriptive guide for item retention. Such an approach would go against our pre-analysis plan, disregard heuristic and model fit criteria, and important empirical and theoretical considerations (e.g., results from cognitive interviews, pilot testing of the CE instrument and other prior theoretical and empirical evidence).
In addition, many methodologists have criticised and demonstrated that the Kaiser-Guttman rule can tend to result in overfactoring or underfactoring given sampling error may influence eigenvalues [2,4,8]. While identifying and retaining too few factors (i.e., underfactoring) may result in an oversimplified understanding of a construct, retaining too many factors (i.e., overfactoring) may lead to violation of parsimony, which is one primary goal of EFA [4]. Whether over-or underfactoring occurs, the factor solution that results may lead to unreliable factors and/or errors in interpretation [2,8]. Given the more parsimonious (i.e., eight-factor) measurement models were supported by our knowledge of the existing theoretical and empirical literature base, and other non-mathematicallybased criteria, we felt our factor extraction and retention decisions were sufficiently justified.

Univariate statistics: CE survey items
Our CE survey included 50 items for factoring (Appendix A). The top five items to which respondents most frequently selected "completely agree" aligned for men and women, though there were some differences with regard to the proportions of those responses between genders (Appendix SA). These items reflected those related to social solidarity or support for one's community members and a sense of pride about being a part of the community: "If someone in this community had a death in their family, the community will come together to support them while they mourn" (94% of men, 91% of women); "I feel happy for my neighbour if they have a good harvest" (96% of men, 92% of women); "I feel proud to be part of this community" (91% of men, 73% of women); "If someone in this community loses a cow or goat, a neighbour will help look for it" (89% of men, 90% of women); "People in this community get to choose the leaders of their own community-based associations, such as Edir leaders" (86% of men, 73% of women). The items to which respondents most frequently selected "completely disagree" also aligned between genders. These items reflected those related to social disorder and inequity: "Sometimes people need to bribe community leaders in order to get things done" (71% of men, 52% of women); "Some households in this community are restricted from community services, such as bed net distribution" (71% of men, 46% of women); "In this community, conflicts like stealing and fighting often occur" (29% of men, 47% of women); "In this community, you have to be careful, otherwise your neighbours will cheat you" (25% of men, 30% of women). In terms of normality of item response distributions, men had 27 items, and women had 15 items with skewness outside of the suggested range (Appendix SA). The WLSMV estimator we employed for our factor analyses makes no distribution assumptions for observed variables, and only assumes a normal latent distribution underlying each observed categorical variable [9], so no action was taken to address any non-normal item distributions [4].

Interpretation of factor loadings
It is acceptable and appropriate to consider factor loadings that vary in magnitude across the various items tapping to a latent factor, as the magnitude of an item's factor loading reflects the proximity of the relationship between the item and the factor to which it taps [10]. Factor loadings may therefore vary in magnitude across the items tapping to a factor based on the proximity of those relationships [10]. Items that are conceptually less influential (i.e., less proximal) to a given latent factor could demonstrate a lower factor loading without necessarily signaling poor quality of the latent factor and poor validity of the measurement model [11]. An item indicator that almost perfectly reflects a given latent factor should be very highly correlated with it (e.g., as represented by a factor loading in the range of 0.800-0.900). However, other items tapping to the latent factor that are conceptually less important or proximal to the factor can, and theoretically should demonstrate lower factor loadings [10,12].

Additional preliminary CFA results
We moved forward with post hoc model refinements of preliminary CFA models to eliminate non-salient and non-significant factor loadings as well as any factors with insufficient component saturation. For the men's model, this resulted in the elimination of nine items. One item (HAVEFRND) was eliminated because it had less than minimal variance (i.e., a response category with zero observations). Five items were eliminated for non-salient factor loadings (OWNWELF=0.140, SAFEATHO=0.151, RESTRSER=-0.223, BRIBELDR=-0.226, DIFPROBS=-0.267), and one item was eliminated because it had both a non-salient and non-significant loading on its designated factor (EXOASSIS=0.011, p=0.793). After eliminating items that were non-significant and non-salient, one factor (social equity) remained with only two items, which we did not deem sufficient for component saturation. We therefore eliminated that factor and the remaining two items which otherwise demonstrated salient and significant loadings (COMMGDEC=0.717, DISTCRIS=0.428). The standardised estimates of the remaining factor loadings from this model were acceptable (Appendix B), and all remaining factors co-varied significantly. The refined preliminary CFA model of the hypothesized CE framework demonstrated adequate absolute model fit (χ 2 :df ratio = 2.606, RSMEA=0.038 [0.036 -0.040]), but still poor incremental fit (CFI=0.911, TLI=0.904). These results suggest that our hypothesised CE framework represented a plausible structure of the mechanisms through which the CE process operates amongst men in the Ethiopian context. However, poor incremental fit statistics suggested that this may not have been the best fitting model framework.
For the women's model, we eliminated a total of ten items. Four items were eliminated as a result of non-salient factor loadings on the designated factor (RESTRSER=-0.105, BRIBELDR=-0.227, EXOASSIS=0.231, SAFEATHO=0.242). Three items were eliminated due to non-salient and nonsignificant factor loadings on the designated factor (DIFPROBS=0.009, p=0.868; CRIMECON=-0.040, p=0.543; CHEATS=0.053, p=0.324). Two factors and their three items were eliminated because the factors demonstrated insufficient component saturation (the factor representing social order with its HARMONY item, and the factor representing social equity with its COMMGDEC and DISTCRIS items). The refined preliminary CFA model only marginally reflected the actual hypothesised framework, as two factor loadings were non-salient (social order and social equity). The standardised estimates of the remaining factor loadings from the resulting model were acceptable (Appendix C). Both absolute and incremental fit statistics indicated poor fitness of the resulting women's factor model (χ 2 :df ratio=3.409, RSMEA=0.058 [0.055-0.060]; CFI=0.895, TLI=0.888). This means that the data failed to validate the hypothesised CE framework for women respondents, indicating the framework did not reflect the mechanisms through which the CE process operates for women in the rural Ethiopian context.
There was considerable overlap in the items eliminated from both men and women refined, preliminary CFA models. All but one (OWNWELF) of the items eliminated from the men's model were also eliminated from the women's model, and five of the ten items eliminated from the women's model were also eliminated from the men's model (SAFEATHO, RESTRSER, BRIBELDR, DIFPROBS, and EXOASSIS).

Additional EFA and CFA results
Complete EFA results reflect coefficients from both rotated (Promax) pattern and structure matrices along with initial and refined CFA results. While not all factor loadings demonstrated in Tables 3 and 4 are in the range of excellent to very good -though they are still in the acceptable range -we hypothesise that some of those items are conceptually more distal (i.e., marginally less important) to the measurement of the latent factor. We present further details regarding both men and women EFA-derived measurement models in subsequent sub-sections.

Additional details regarding the men's EFA and CFA results
During the EFA analyses, we eliminated three items (HAVEFRND, HAPPYNEI, PROUD) due to less than minimal variance (i.e., no observations in one or more item response category) that prevented the EFA from being processed in MPlus. We also eliminated twelve items, in a step-wise manner: ten items were eliminated because they had no salient loadings on any factor (BRIBELDR, EXOASSIS, SAFEATHO, COPARTCG, CHEATS, INTERCRI, COMMGDEC, CONTRDEV, SUPMOURN, LOSTCOW); one item (PAREXOGP) was eliminated due to evidence of extreme multicollinearity with another related item that loaded to the factor; and one item (CRIMECON) was eliminated because although its pattern coefficient was salient, its structure coefficient was not. This resulted in a 35-item men's CE measurement model (with two complex variables) that tapped to seven factors of CE: social response, social networks and personal agency, social attachment, common vision, community leadership, associational participation, and community organisation.
Factor one, labelled "social response" corresponded to the informal social control domain, though it also tapped to certain aspects of cognitive social capital (e.g., trust in community members, reciprocity of knowledge) that may influence social response. The factor contained nine items that tap to various facets of perceptions regarding the community's propensity to address communityand sub-community level issues, including social disorder (e.g., harmony, problem solving, conflictresolution, common moral principles and codes of behaviour), support in times of crisis, and tolerance. The concepts reflected in this factor align closely with our hypothesised operational definition of social control, described as an absence of general conflict and threats to the existing order, effective informal social control, tolerance, and intergroup cooperation (Table 1).
Factor two, labelled "social networks & personal agency" corresponded to the cognitive social capital domain, though it also tapped to structural social capital, as it reflects the strength and responsiveness of one's social structures. The factor comprised of five items that relate to issues surrounding supporting networks and individuals cooperating to support one another for either mutual or one-sided gain. Two items related to self-efficacy loaded to this factor. This suggests that for men, one's perspectives regarding personal agency (i.e., individual behavioural control) is linked to perceived expectations that help will be given to or received from others, when needed [13].
Factors three and four corresponded to the social cohesion domain. Factor three, labelled "social attachment" included five items that tap to concepts related to place identity, community acceptance and attachment, and collective agency. Factor four, labelled "common vision" was comprised of six items that reflect shared norms (perceptions of normative expectations regarding contributions to community development) and culture (common values, hopes for the future, ideas about how the community should be managed), social equity (equal distribution of goods in times of crisis), and perceptions regarding community-level agency.
Factors five, six, and seven pertained to the structural social capital domain. Factor five, labelled "community leadership" reflected four items tapping to various aspects of social trust, support, and strength of leadership of formal administrative leaders and both formal and informal community leaders. Factor six, labelled "associational participation" corresponded to the respondent's personal involvement in established community structures -both exogenously and endogenously organised. The three constituent items reflect both membership (as indicated by meeting attendance) and participation in associational activities. Factor seven, labelled "community organisation" corresponded to various aspects of community organisation, including the activity level of endogenously organised community associations and leaders thereof, community-selected representation, prioritisation of community development, and social justice and equity.
During CFA, we moved forward with post hoc model refinements to eliminate non-salient and non-significant factor loadings as well as any factors with insufficient component saturation. Prior to CFA, we eliminated one item (ADVICE) due to less than minimal variance. Subsequent post hoc model refinements resulted in the elimination of five additional items. Two items were eliminated for nonsignificant and non-salient factor loadings (SHOULDEV=0.075, p=0.513; COLLEFF=0.071, p=0.350), and three items were eliminated for non-salient factor loadings (OWNWELF=0.155, RESTRSER=-0.260, and DIFPROBS=-0.278). The standardised estimates of factor loadings from this model were acceptable (Table 3).

Additional details regarding the women's EFA and CFA results
During the EFA analyses, we eliminated one item (HAPPYNEI) due to less than minimal variance. We eliminated twelve additional items in a step-wise manner: six items were eliminated because they had no salient loadings on any factor (RESTRSER, BRIBELDR, COPARTCG, EXOASSIS, COMMGDEC, SHOULDEV); four items were complex variables that cross-loaded on more than one factor without sufficient substantive justification (SAFEATHO, SUPMOURN, SHAREKNO, TRUSTLDR); one item (PAREXOGP) was eliminated due to evidence of extreme multicollinearity with another item that loaded to the factor; and one item (ONWELF) was eliminated because although its pattern coefficient was salient, its structure coefficient was not salient on the factor of interest. This item reduction process resulted in a 37-item women's CE measurement model that tapped to seven factors of CE: social networks & reciprocity, social disorder, social attachment & personal agency, social response, associational participation, common vision, and community organisation & leadership.
Factor one, labelled "social networks & reciprocity" corresponded to the cognitive social capital domain, though it also tapped to certain aspects of structural social capital, as it reflected perceptions related to collections of individuals that promote and protect mutual or personal interests. The factor contained eight items that indicate various aspects of reciprocity demonstrated through social networks, the strength of personal relationships, and the community's propensity to contribute to community development.
Factors two and four corresponded to the informal social control domain, though factor four also tapped to certain aspects of cognitive social capital. Factor two, labelled "social disorder" contained three items that reflect the level of disorder in the community, including conflicts such as stealing, fighting, cheating, and problems caused by intolerance of differences amongst people. Factor four, labelled "social response" contained eight items that tap to various facets of perceptions regarding the community's propensity to address internal issues, including willingness to intervene when crime-like activities are observed, conflict-resolution, common moral principles and codes of behaviour, support in times of crisis, community trust, and strength of relationships.
Factors three and six corresponded to the social cohesion domain. Factor three, labelled "social attachment & personal agency" included six items that tap to concepts related to place identity, community acceptance and attachment, and personal agency. This suggests that, for women, one's sense of self-agency is linked to one's sense of belonging or social attachment. Factor six, labelled "common vision" is comprised of five items that reflect shared culture (common values, hopes for the future, ideas about how the community should be managed), social equity (equal distribution of goods in times of crisis), and perceptions regarding community-level agency.
Factors five and seven corresponded to the structural social capital domain. Factor five, labelled as "associational participation" related to the respondent's personal involvement in established community structures -both exogenously and endogenously organised. The three constituent items reflect both membership (as indicated by meeting attendance) and participation in associational activities. Factor seven, labelled "community organisation & leadership" corresponded to various aspects of organisation within the community, including the activity level of endogenously organised community associations and leaders thereof, and community-selected representation.
We conducted CFA on the 37 items tapping to seven factors, as indicated by the EFA-derived women's CE factor solution. We moved forward with post hoc model refinements to eliminate nonsalient and non-significant factor loadings as well as any factors with insufficient component saturation. This resulted in the elimination of five items and one factor. One item was eliminated for a non-significant and non-salient factor loading (CLOSE=0.167, p=0.075), and two items were eliminated for non-salient factor loadings (CHEATS=0.213, and SIMBLIEF=0.309). With the elimination of one non-saliently loading item to the social order factor, the factor itself failed to demonstrate sufficient component saturation, so the factor and its remaining two items (DIFPROBS=0.900, p=0.001 and CRIMECON=0.366, p=0.001) were eliminated from the women's measurement model. The standardised estimates of factor loadings for the resulting six-factor model were acceptable (Table 4). Modification Indices above 3.84 on the women's model were all relatively low, meaning localised strain was relatively low in all areas identified. No further modifications were made.

Additional details regarding comparison of men's and women's CE measurement models
The men's CE measurement model included one more factor (community leadership) than was indicated by the women's CE measurement model. Two of the three items that comprised the leadership factor in the men's model are included in the community organisation factor in the women's measurement model, as there was sufficient substantive justification for those items tapping to that factor.

Comparison of CFA results of our hypothesised CE framework vs. EFA-derived factor solutions
Fit statistics from the preliminary CFA of our hypothesised CE framework and the CFA of the EFA-derived factor solution suggest that slight revisions that were substantively justified resulted in valid CE measurement models for both men and women in the Ethiopian context (Appendix D).

Comparison of fit statistics for CFA of refined, single-group and parsimonious models
Given it is encouraged to consider numerous alternatives before settling on final measurement models [6], we performed a CFA on both men's and women's models that reflected the more parsimonious set of CE indicators (i.e., only those that were completely overlapping between refined and validated CFA models). We present model fit statistics for those models, and compare them to the refined, validated CFA models in Appendix E. These results indicate that both the more saturated and parsimonious models are valid CE measurement metrics. The gender-specific saturated models represent slightly better fitting models.
We present model fit statistics, unstandardised Β, standard errors, and standardised β for competing MIMIC models in Appendix F.

ADDITIONAL DISCUSSION
Establishing this CE measurement scale in the early phases of the Andilaye trial allowed us to measure and assess collective efficacy at baseline, prior to the implementation of a community-level demand-side sanitation and hygiene intervention. We plan to employ this validated scale again at endline, and compare changes in CE measures between intervention and counterfactual communities over time (pre-, post-intervention). This will allow us to test our hypothesis that there is a bidirectional, causal association between CE and intervention effectiveness.

Further discussion of gender-specific CE measurement models
There were slight differences between gender-specific CE measurement models (31-item, sevenfactor solution vs. 33-item, six factor solution for men and women). Major differences between men's and women's CE measurement models involved: 1) the number of factors included in the measurement model, and 2) the manner in which individual-level behavioural control items (SELFEFF, SEDEV) correlated with factors related to social networks versus social attachment for men and women, respectively. The ordering of the CE scale factors also differed between men's and women's measurement models, and social networks & reciprocity emerging as the first factor in the women's model while social response emerging as the first factor in the men's model. These types of differences were expected, and are supported by empirical evidence that suggests women have a higher dependence on social networks and "the commons" than men [14].
The women's CE measurement model included several additional items that tapped to its social network factor that were not included in the men's measurement model. These items reflected additional concepts that reflected facets of reciprocity, communal contribution and collaboration, and solidarity. The women's measurement model also indicated that willingness to intervene in situations of delinquent behaviour was an important item related to social response, and perceptions regarding a sense of pride in being part of one's community was an important item related to social attachment. These items were not indicated in the men's measurement model, though at least in the case of the item that corresponded to pride, the exclusion of that indicator may have to do with less than minimal variance amongst the item responses, as one response category for each of the split-half samples had no observations. The men's measurement model included two items that tapped to its social response factor that were not included in the women's measurement model. These items reflected common understanding regarding right and wrong and information sharing.
The men's measurement model also indicated that perceptions regarding normative expectations about members of the community working together to develop the community was an important item related to the common vision factor. Men's and women's measurement models differed in the sense that the men's CE measurement model indicated that a seventh factorcommunity leadership -was important for measuring CE. Two of the items that were included in this factor -those indicating supportive formal leadership and strong informal leadership -were included elsewhere in the women's measurement model (community organisation, as supported by sufficient substantive justification). A third item related to perceived trust in the community's leaders was not included in the women's measurement model, but was indicated as an important component for the measurement of community leadership in the men's model.
While we did reveal the underlying CE factor structure for gender-specific models, we also determined that there was considerable overlap between men's and women's CE measurement models. We determined that a parsimonious model that reflected all factors and items in common between the two gender-specific models demonstrated good model fit, and may therefore be used to measure and compare CE between genders. That said, the use of gender-specific CE scales may allow interested researchers to assess the mechanisms through which CE operate, and monitor how measures related to these gender-specific mechanisms change over time, throughout the duration of a development programme or research study.
Significant differences in associational participation factor scores corroborate existing evidence that suggests women may participate less in endogenous and exogenous community structures. This findings indicates that working through formal community structures to enhance women's behavioural control perspectives, including self-and collective efficacy, may not be an appropriate approach. More appropriate approaches may include community-level or household-level intervention activities.
In terms of selecting a CE measurement metric for administration more broadly, it is necessary to determine the aim and objectives of the work at hand, and weigh the benefits of being able to compare CE scores across genders (refined parsimonious CE scale) against being able to assess the mechanisms through which CE operate (gender-specific, saturated CE scales). Our results indicated that CE perceptions differ between men and women, even amongst those living in the same household. Therefore, researchers and programme implementers using an adapted version of our parsimonious CE scale should either consider obtaining data from men and women within the same household or obtaining CE data from a random selection of men and women within a given community.

Additional discussion regarding factor indeterminacy
While the refined and final validated factor structures championed by this study demonstrate good model fit, and are substantively justified, they reflect only one possible representation of the relationship amongst items in the men's, women's, and parsimonious CE measurement models. As with any EFA, our results were influenced by the structure of the data for the particular sample we ascertained. Other measurement models that fit the data and represent the conceptualisation of CE as well or better than our refined gender-specific and final parsimonious CE measurement models may exist [6]. Through the employment of a randomly selected split-half hold-out sample, we sought to assess the stability of our EFA-derived CE factor structures across an independent sample from the same population, as suggested by numerous methodologists [4,7,15].

Further discussion of analytical limitations
Mathematically-focused factor extraction methods have a tendency to under-or over-estimate the number of factors in a solution [2,4,8]. The results of scree tests are often ambiguous (e.g., no clear shift in the slope) and subject to interpretation [4]. As a result, we used a combination of mathematical (i.e., eigenvalue-based Kaiser-Guttman rule), heuristic (i.e., scree plot), statistical (i.e., model fit statistics), and substantive justification to guide factor extraction. That said, we were not able to perform more rigorous procedures (e.g., parallel analysis) to confirm that we extract the correct number of factors, as these analyses are not available for categorical data in Mplus [16].
Sufficient component saturation is needed (i.e., two or more items with salient factor loadings) to guarantee appropriate factor interpretation [7]. While some methodologist suggest that as few as two to three items provide sufficient component saturation [17], other more conservative guidelines suggest four or more items with factor loadings of 0.5 or higher, and an average factor loading of 0.700 across all items tapping a factor. All six factors in the final parsimonious CE measurement model had three to five items per factor, all loading ≥ 0.478, indicating sufficient component saturation. With the except of one factor (i.e., "social networks", average factor loading = 0.663), all factors demonstrated average factor loadings of 0.700, signaling that the items were good measures of the factors to which they tapped. All seven factors of the refined, validated men's CE measurement model and all six factors of the refined, validated women's measurement model included three or more items, all with factor loadings greater than 0.500. However, one factor on the women's model, and four factors on the men's model included three items only, which just satisfies moderate [17], but does not more conservative guidelines for component saturation. In addition, two items within the men's measurement model, and one item within the women measurement model reflect factor loadings falling within the salient but only "adequate" range (i.e., 0.400-0.440). More importantly, perhaps, two factors in the refined, validated men's measurement model, and one factor in the refined, validated women's measurement model demonstrated average factor loadings below the ideal 0.700 average (average factor loading on refined CFA: 0.668 and 0.634 on the men's model; 0.656 on the women's model). Interestingly, the factor on the women's model and one factor on the men's model with average factor loadings less than 0.700 represented the social response factor. This suggests that perhaps the items we included in our CE survey for this CE sub-construct may not have included one or more proximal indicators of social response in the Ethiopian context. Given our EFA results reflect the structure of the sample we ascertained, and the role that sampling error and other systematic error may play in the estimation of factor analytic results, initial EFA findings should be interpreted with caution. These findings should be cross-validated through additional EFA or CFAs using independent datasets [4]. We employed a random split-half hold-out sample for measurement model validation, and the resulting findings were promising, especially our refined final parsimonious CE measurement scale. Still, these findings should undergo further validation with independent datasets, which is planned for another WASH study being evaluated by members of our research group. Since our results indicated that only minimal component saturation was attained for some CE factors, and more proximal indicators may not have been included for social response and social network factors, additional formative work that further explores these issues is warranted.