1. Introduction
Municipal solid waste management constitutes one of the most pressing environmental, institutional, and legal challenges of the twenty-first century, particularly in emerging economies where the gap between waste generation and adequate treatment continues to widen. According to the United Nations Environment Programme’s (UNEP) Global Waste Management Outlook 2024, global municipal solid waste generation reached 2.1 billion tonnes in 2023 and is projected to rise to 3.8 billion tonnes by 2050 if no structural measures for prevention and circularity are adopted [
1]. The World Bank, through its report What a Waste 2.0, warned that at least 33% of waste generated globally is managed in an environmentally unsafe manner, through open dumping or uncontrolled burning, with direct impacts on public health, aquatic ecosystems, and greenhouse gas emissions [
2].
Within this context, the circular economy has emerged as a transformative paradigm that seeks to decouple economic growth from natural resource consumption. Geissdoerfer et al. [
3] conceptualized it as a new sustainability paradigm capable of redefining production and consumption systems beyond linear extraction and disposal. More recently, Kirchherr et al. [
4] revisited the concept through a large-scale definitional analysis, emphasizing the centrality of resource loops, value retention, and systemic reconfiguration. The transition toward circular models of waste management requires not only technological innovation, but also enabling regulatory frameworks, strong institutional capacities, and technical operations capable of translating legal provisions into tangible outcomes. Blomsma and Brennan [
5] argued that the circular economy emerged as a new framing around prolonged resource productivity, thereby highlighting the need for organizational and institutional adaptation. In turn, Ghisellini et al. [
6] emphasized that circular transition depends on a balanced interaction between environmental and economic systems, which implies governance, regulation, and operational implementation capacities.
In Latin America and the Caribbean, the situation is particularly critical. The region generates approximately 231 million tonnes of municipal solid waste annually, with recycling rates below 10% and more than 40% of waste being managed inadequately, according to estimates by the Inter-American Development Bank. The Circularity Gap Report 2024 indicates that more than 60% of the total waste generated in the region lacks traceability or systematic records, severely limiting the formulation of evidence-based policies. Recent studies have documented that the predominance of an end-of-pipe approach, focused on final disposal rather than prevention and recovery, constitutes a major structural barrier to circularity in Latin American contexts. Graziani [
7] identified persistent regional barriers associated with institutional fragmentation, limited investment, and weak policy coordination in Latin America and the Caribbean. Likewise, Hernández-Betancur et al. [
8], in their systematic review of Latin American cities, showed that municipal solid waste management still tends to prioritize disposal over prevention, reuse, and recovery. Unlike European economies, where supranational directives have consolidated integrated regulatory frameworks with binding recycling targets and extended producer responsibility, countries in the region exhibit fragmented legal frameworks, weak intergovernmental coordination, and heterogeneous municipal capacities that hinder the effective implementation of circular strategies. Vilela-Pincay et al. [
9] specifically highlighted the institutional challenges that developing countries face when attempting to align circular economy principles with municipal waste governance. Similarly, Ferronato and Torretta [
10] showed that waste mismanagement in developing contexts is strongly associated with structural deficiencies in regulation, infrastructure, and institutional enforcement.
In Peru, this problem reflects the tension between a progressive regulatory framework and deficient territorial implementation. Legislative Decree No. 1278, the Law on Integrated Solid Waste Management, and its amendment through Law No. 32212 of December 2024, incorporate circular economy principles, extended producer responsibility, and material recovery as core pillars of national policy. Nevertheless, according to the 2024 Statistical Yearbook of the Environmental Sector published by the Ministry of the Environment (MINAM), the composition of household waste in 2023 was 56% organic, 21% recoverable inorganic, 14% non-recoverable, and 8% hazardous, while only 1.8% of total municipal waste generated was effectively recovered. Although 61.7% of waste is disposed of in sanitary landfills, the remaining 36.5% is sent to inadequate disposal sites, thereby revealing a structural gap in infrastructure and management [
11].
In the Cajamarca region, this situation is even more severe: 81.9% of collected solid waste ends up in open dumps, only 4.7% of municipalities have operational sanitary landfills, and 57.94% of waste is disposed of inadequately [
12]. Cajamarca comprises 92 municipalities that deposit waste in dumps and 64 that lack solid waste management instruments, positioning it as one of the regions with the greatest institutional and operational deficits in the country [
13].
The relevance of this study lies in its contribution to a comprehensive understanding of the legal-regulatory, institutional, and operational factors that condition the performance of solid waste management under a circular economy approach in Peruvian municipalities. While the international literature has made significant progress in modeling barriers to and enablers of circularity through quantitative approaches such as structural equation modeling, empirical evidence in Latin American municipal contexts remains scarce and fragmented. Hair et al. [
14] consolidated the methodological foundations of partial least squares structural equation modeling for the analysis of latent constructs in complex explanatory settings. In parallel, Sarstedt et al. [
15] documented the rapid expansion and consolidation of PLS-SEM in applied research over the last decade, thereby reinforcing its relevance for studies that combine theory testing and predictive orientation.
Previous studies have employed PLS-SEM to assess determinants of environmental management in organizational settings. For example, Dangelico et al. [
16] used a PLS-SEM approach to investigate the antecedents of sustainable behavioral intention, showing the usefulness of latent-variable modeling in environmentally oriented decision contexts. Likewise, Khan et al. [
17] examined the relationship between Industry 4.0 and circular economy practices, demonstrating that PLS-SEM is suitable for evaluating structured relationships among sustainability-related constructs. Yet the application of hybrid models that integrate the explanatory power of structural equation modeling with the predictive capacity of machine learning remains a relevant methodological gap in the field of solid waste management. Shmueli et al. [
18] advanced the discussion on predictive assessment in PLS-SEM by formalizing guidelines for out-of-sample evaluation. Building on that logic, Sharma et al. [
19] proposed a hybrid SEM–machine learning approach to prediction, thereby illustrating the analytical potential of combining latent explanatory structures with algorithmic classification. The combination of PLS-SEM with supervised classification algorithms such as XGBoost, complemented by interpretability analysis using SHAP, makes it possible to overcome the limitations of purely confirmatory approaches by simultaneously offering theoretical validation of constructs and context-sensitive predictive capacity. Lundberg and Lee [
20] provided the theoretical basis for SHAP as a unified approach to interpreting model predictions through additive feature contributions. Chen and Guestrin [
21], in turn, introduced XGBoost as a scalable and efficient tree-boosting system with strong predictive capacity in structured-data environments.
In this sense, the proposed hybrid design is not an arbitrary combination of tools, but a sequential analytical strategy in which PLS-SEM supports theory-driven latent construct modeling, XGBoost 3.2.0 captures nonlinear predictive patterns, and SHAP provides transparent interpretation of the predictive stage, thus linking explanatory and predictive logics within a single framework.
Although technological factors such as recovery infrastructure, recycling intensity, treatment efficiency, and plant-level operating costs are relevant to circular economy implementation, the present study deliberately focuses on the governance-performance architecture of municipal solid waste management. Specifically, the analytical model was designed to examine how regulatory conditions, institutional capacity, and operational management are associated with perceived performance across municipalities. A distinct technological dimension was not included because the study aimed to prioritize comparable governance and management factors at the municipal level rather than facility-specific technical metrics, which may vary substantially in availability and measurement quality across local contexts.
The research problem is articulated around the following question: To what extent are the legal-regulatory framework, institutional capacity, and operational management structurally related to perceived performance in solid waste management under a circular economy approach in the municipalities of Cajamarca, and what is the predictive capacity of these latent constructs with respect to participation in training processes on circular economy? This question arises from the empirical observation that, despite the existence of national regulatory frameworks promoting circularity, the translation of these legal instruments into effective operational outcomes depends on institutional and technical mediations that have not yet been quantitatively modeled in the Peruvian context. The identified gap is both theoretical, due to the lack of integrated models linking regulation, institutional capacity, operations, and performance within a coherent causal architecture, and methodological, given the absence of prior applications of hybrid PLS-SEM and machine learning pipelines in the assessment of municipal circular economy practices in Peru. Although technological and cost-efficiency factors are also relevant for circular economy implementation, the present study deliberately prioritizes a governance-oriented specification centered on regulatory, institutional, operational, and perceived-performance dimensions at the municipal level.
The originality of this study lies in three complementary contributions. First, it provides empirical evidence from Cajamarca, a highly constrained territorial setting in Peru where circular economy implementation faces severe operational and institutional deficits. Second, it proposes an integrated analytical framework linking regulation, institutional capacity, operational management, and perceived performance within a single municipal governance architecture. Third, it combines latent-variable modeling with explainable machine learning, thereby extending the methodological toolkit available for evaluating circular economy performance in local waste management systems.
This study is directly aligned with Sustainable Development Goals 11 (sustainable cities, target 11.6), 12 (responsible production and consumption, targets 12.4.2 and 12.5.1), 13 (climate action), and 16 (strong institutions), insofar as it evaluates the causal chain connecting regulation, institutional capacity, technical operations, and environmental performance in municipal waste management, thereby contributing quantitative evidence for the territorial governance of circularity.
Accordingly, the general objective of this study is to assess the relationship among the legal-regulatory framework, institutional capacity, operational management, and solid waste performance under a circular economy approach in the municipalities of Cajamarca, through a hybrid PLS-SEM and machine learning model. More specifically, the study seeks to: first, assess the measurement properties of a model comprising four latent constructs (legal framework, institutional capacity, operational management, and performance) for the assessment of circular economy in municipal solid waste management; second, determine the structural relationships among these constructs, examining whether operational management plays an articulating role between regulation, institutional capacity, and performance; third, evaluate the predictive capacity of latent scores and contextual variables (experience and district size) on participation in circular economy training through supervised classification with XGBoost; and fourth, identify the variables with the greatest contribution to prediction through SHAP interpretability analysis, thereby integrating the explanatory and predictive perspectives of the hybrid pipeline.
The remainder of this paper is organized as follows.
Section 2 presents the theoretical framework supporting the proposed constructs and the hybrid analytical approach.
Section 3 describes the study design, instrument, data, and analytical procedure.
Section 4 reports the results of the measurement model, structural model, and predictive analysis.
Section 5 discusses the findings in relation to the literature and the study limitations. Finally,
Section 6 presents the main conclusions and implications for research and practice.
2. Theoretical Framework
The circular economy, a paradigm that replaces linear extractive logic with closed loops of valorization in which waste is eliminated as a concept and natural systems are regenerated [
22], requires, in municipal solid waste management, a simultaneous reconfiguration of regulatory, institutional, and operational architectures. Prieto-Sandoval et al. [
23] demonstrated that the adoption of local circular practices depends on the coexistence of coherent regulations, technical capacities, and interinstitutional coordination, conditions that rarely converge in developing economies; this perspective was formalized by Velenturf and Purnell [
24] through a framework that articulates legal instruments, institutional resources, and technical processes.
The legal-regulatory dimension constitutes the first construct of the model. Agovino et al. [
25], through shift-and-share analysis across 28 European countries, demonstrated that regulatory alignment determines the achievement of circular objectives. Awino et al. [
26], based on 151 articles covering 132 countries, showed that regulatory weakness and fragmented competencies are the most persistent barriers in the Global South. Likewise, Diaz-Barriga-Fernandez et al. [
27] documented, in Latin America, a gap in which advanced legislation coexists with weak enforcement. Institutional capacity, the second construct, captures the organizational conditions required to implement circular policies. Gutberlet et al. [
28], in Brazil, Colombia, and Argentina, conceptualized it in terms of qualified personnel, financial stability, and political leadership. Wilson et al. [
29] specified that institutional weakness involves coordination deficits, staff turnover, and the absence of monitoring. Marshall and Farahbakhsh [
30] positioned these factors as first-order determinants, and Orihuela [
31] confirmed this in Peru through data envelopment analysis.
The operational dimension, the third construct, translates policy into technical processes of collection, segregation, valorization, and final disposal. Rigamonti et al. [
32] integrated coverage, standardized protocols, infrastructure, and environmental education into an operational assessment framework, components that Abdel-Shafy and Mansour [
33] identified as critical in developing countries. The mediating centrality of the operational dimension was demonstrated by Guerrero et al. [
34]: after reviewing 117 studies, they concluded that improvements in recycling and regulatory compliance occur only when organizational capacities are translated into structured processes, a principle that underpins the hypothesis of C3 as a mediator between the regulatory-institutional binomial and performance.
The latter, the fourth construct, synthesizes tangible outcomes. Scheinberg et al. [
35] distinguished among process, outcome, and impact indicators. Leal Filho et al. [
36], across 45 countries, validated the correspondence between key stakeholders’ perceptions and objective indicators, thereby legitimizing perception-based instruments such as the one employed in this study. Di Foggia and Beccarello [
37] further confirmed that circularity generates measurable positive effects only when regulation, institutional capacity, and operations are sufficiently developed, thus configuring the sequential causal chain that this study empirically tests.
From a methodological standpoint, PLS-SEM has become a benchmark technique for models with latent variables in moderate samples that combine explanation and prediction. Ringle et al. [
38] documented its suitability for constructs that are not directly observable. Liengaard et al. [
39], in a review of 204 studies, confirmed its advantages when using Likert scales and in specific territorial contexts. Müller et al. [
40] emphasized that AVE values below 0.50 should be interpreted in light of semantic heterogeneity and reverse-coded items, a criterion applicable to the instrument used in this study. Integration with machine learning overcomes the limitations of each approach in isolation. Sarstedt et al. [
41] argued that PLS-SEM excels in theoretical validation, whereas XGBoost captures nonlinear relationships undetectable by linear models. Rashad et al. [
42], in Land Degradation & Development, demonstrated the feasibility of combining PLS-SEM, XGBoost, and SHAP in a pipeline in which latent scores feed the classifier and SHAP values interpret the results, an architecture identical to that adopted here. SHAP interpretability, grounded in Shapley game theory, decomposes each prediction into additive contributions by variable, thereby addressing the black-box problem [
43]. Cakiroglu et al. [
44] validated that this combination generates interpretable knowledge by providing causal traceability to the predictive component. Accordingly, the present study adopts this hybrid architecture as an emerging but already documented methodological strategy for contexts in which latent theoretical structures must be assessed while also exploring predictive behavior and model interpretability. This interpretation is also reinforced by Simões and Marques [
45], who showed that regulation may influence the productivity and performance of waste utilities not only through legal stringency, but also through the institutional incentives it creates for service organization and operational efficiency.
3. Materials and Methods
The study was conducted under a quantitative approach, with a non-experimental, cross-sectional, and analytical-predictive design aimed at integrating an explanatory component (latent construct modeling) with a predictive component (supervised classification). The methodological strategy adopted a two-stage hybrid pipeline: (i) estimation of a partial least squares structural equation model (PLS-SEM) to represent and validate latent constructs measured through a survey, and (ii) use of the resulting latent scores as input variables in a machine learning model based on XGBoost for the prediction of a dichotomous target variable. This two-stage specification was selected to preserve the theoretical consistency of the latent measurement system while extending the analysis toward nonlinear classification and interpretable variable contribution assessment.
3.1. Study Context, Respondents, and Sampling
The unit of analysis in this study was the municipal respondent linked to public or environmental management functions related to solid waste management in Cajamarca-Peru. The analytical sample comprised 120 valid survey observations collected from municipal contexts in the region. The sampling strategy was non-probabilistic and oriented toward respondents with direct knowledge of municipal waste management processes, legal implementation, or operational conditions. Accordingly, the findings should be interpreted as analytically informative for the studied context rather than statistically representative of all municipalities in Peru.
3.2. Data Collection, Inclusion Criteria, and Ethics
Data were collected through a structured survey administered to eligible municipal respondents involved in public or environmental management activities. Inclusion criteria required direct familiarity with solid waste management practices, municipal procedures, or related governance functions. Incomplete or invalid responses were excluded from the final analytical dataset. Because the survey was administered in municipal contexts through non-probabilistic access to eligible respondents, a precise response rate could not be calculated. Participation was voluntary and based on informed consent, and all responses were processed anonymously. The study followed the ethical principles applicable to minimal-risk survey research; under the institutional conditions of the project, formal committee approval was not required.
The analyzed dataset consisted of 120 observations and variables derived from a structured survey. Specifically, 24 Likert-type items were used, organized into four blocks of six indicators each (C1_1 to C1_6, C2_1 to C2_6, C3_1 to C3_6, and C4_1 to C4_6), in addition to three contextual variables (D1, D2, and D3).
Table 1 presents the data collection instrument. Variables D1 and D2 were treated as ordinal factors for the descriptive analysis, whereas D3 was defined as a dichotomous target variable (“No”/“Yes”) for the predictive stage.
No standalone technological construct was incorporated into the survey design. This delimitation was intentional, as the study sought to model regulatory, institutional, operational, and perceived-performance dimensions that were directly comparable across municipalities; therefore, plant-efficiency indicators, recovery technology intensity, and cost-based engineering measures were left for future research.
During the data preparation phase, the dataset was imported using readxl, and variable transformation was performed using tidyverse. The construct items (C1–C4) were converted to numeric format, and the negatively worded items C1_5, C2_5, C3_5, and C4_5 were reverse-coded so that higher values consistently represented more favorable conditions across all indicators. The contextual variables were recoded with explicit levels to ensure analytical consistency. Initial data quality checks were conducted, including the number of rows and columns, total count of missing values, and class distribution of D3, thereby confirming the integrity of the dataset prior to modeling. In addition, descriptive visualizations were generated to inspect the composition of Likert responses by construct and the class balance of the target variable, enabling a preliminary assessment of variability and distribution. Given the systematic use of reverse-worded item 5 across all four constructs, an additional sensitivity analysis was planned to compare measurement-model performance with and without the “_5” indicators, in order to assess their impact on convergent validity and loading stability.
The explanatory stage was implemented with PLS-SEM using the cSEM 0.6.1 package in R. A reflective measurement model (CFA-type syntax) was specified for four latent variables (C1, C2, C3, and C4), each defined by six observed indicators. The structural model included directed relationships among constructs: C2~C1; C3~C1 + C2; C4~C1 + C2 + C3. Estimation was carried out using the PLS-PM approach (.approach_weights = “PLS-PM”), incorporating bootstrap resampling (.resample_method = “bootstrap”) with 499 replications to support inferential assessment and parameter stability. This choice is methodologically appropriate in contexts involving Likert scales and moderate sample size, where it is necessary to simultaneously model relationships among constructs while preserving predictive robustness. Bootstrap resampling was used not only to assess parameter stability but also to derive inferential statistics for path coefficients and indirect effects, including confidence intervals and p-values where applicable.
To evaluate the PLS-SEM model, indicators of measurement model quality and structural model quality were extracted. Specifically, reliability and convergent/discriminant validity metrics were calculated using the function calculateAVE() from cSEM. Factor loadings (Loading_estimates) and path coefficients (Path_estimates) were also extracted from the internal structure of the estimated object. Because the internal output structure may vary across package versions, an auxiliary function (get_nested) was used for robust component retrieval, thereby ensuring reproducibility of the analytical workflow. In parallel, a supplementary model was estimated with lavaan (MLR estimator) solely for graphical representation of the construct and path diagram using semPlot, without replacing the main estimation performed with cSEM.
The transition to the predictive stage was carried out through the extraction of latent scores (construct scores) from the PLS-SEM model. Because getConstructScores() may return objects with heterogeneous structures (e.g., lists including auxiliary components such as weights and selected indicators), a robust routine was implemented to specifically identify and retrieve the Construct_scores matrix, which was subsequently converted into a tibble for further use. These latent scores (C1, C2, C3, and C4) constitute a reduced and theoretically informed representation of the measurement system and were combined with contextual variables D1 and D2 to form the predictor set for the supervised model.
For the machine learning stage, an analytical dataset (ml_df) was constructed by integrating latent scores and contextual variables, defining as the target variable a binary version (target = Yes/No) of D3 in order to standardize metric computation and ensure compatibility with xgboost. Subsequently, a manual stratified 80/20 train-test split was performed, preserving the class proportions of the target variable in both subsets. The categorical variables (D1 and D2) were transformed through one-hot encoding using model.matrix, and a column-alignment routine between the training and test matrices was incorporated to avoid inconsistencies when certain factor levels appeared only in one partition.
The predictive model was implemented with XGBoost (xgboost, objective binary:logistic) using xgb.DMatrix matrices. Hyperparameter selection was performed through a grid search over eta, max_depth, min_child_weight, subsample, and colsample_bytree, combined with stratified 5-fold cross-validation (xgb.cv) and early stopping (20 rounds) to prevent overfitting. To strengthen the workflow and avoid interruptions caused by package version differences or occasional errors during the validation process, a safe function (run_xgb_cv) was programmed using tryCatch in R 3.1.0, capturing errors and recording the status of each evaluated combination. In the absence of valid configurations, a methodologically reasonable fallback hyperparameter set was defined to ensure continuity of the analysis. The final model was trained using the best identified combination and the optimal number of rounds (best_iter).
Predictive performance was evaluated on the test set using classification and calibration metrics. Predicted probabilities were obtained, and the ROC curve was estimated with pROC in R 3.1.0, calculating the AUC-ROC as the main measure of discrimination. The optimal cutoff point was determined according to the Youden criterion, from which predicted classes were derived for the computation of accuracy, sensitivity (recall), specificity, precision, F1-score, and Brier score. In addition, model calibration was assessed through a decile-based calibration curve, comparing the mean predicted probability with the observed event rate in each group. This combination of metrics enables the simultaneous assessment of discriminative capacity, error balance, and probabilistic consistency of the model.
The interpretability of the machine learning component was addressed through SHAP (SHapley Additive exPlanations) values. “True” SHAP contributions were extracted from xgboost using predcontrib = TRUE, removing the bias term (BIAS/intercept) and summarizing variable importance through the mean absolute contribution value (mean |SHAP|). Optionally, the shapviz package was integrated for complementary visualizations. This procedure made it possible to identify which latent scores and contextual variables most strongly explained the final classification, thereby reinforcing the interpretive traceability of the hybrid pipeline.
5. Discussion
The most robust structural finding of the PLS-SEM model lies in the magnitude of the C3→C4 coefficient (β = 0.817), which positions operational management as the direct and dominant determinant of perceived performance, relegating the direct effects of the legal-regulatory framework (C1→C4 = 0.015) and institutional capacity (C2→C4 = 0.045) to virtually null values. This pattern is consistent with an articulating role of operational management: regulation and organizational capacities do not appear to affect management outcomes directly and uniformly, but rather through their translation into structured operational processes of collection, segregation, monitoring, and environmental education. This evidence converges with Guerrero et al. [
34], who, based on 117 studies in developing countries, concluded that technical operations constitute the link that transforms institutional conditions into measurable environmental performance, and with the circular infrastructure framework proposed by Velenturf and Purnell [
24], which postulates the need for deliberate alignment between regulatory instruments and technical processes in order to materialize the circular transition.
The C1→C2 chain (β = 0.629) and C2→C3 (β = 0.583), in turn, reveals that the perception of regulatory clarity and enforceability mechanisms is associated with better organizational conditions, and that these, in turn, enable operational structuring, a sequence consistent with the argument advanced by Marshall and Farahbakhsh [
30] regarding the primacy of institutional and governance factors as antecedents of performance. The underlying implication is that, in the municipal context of Cajamarca, the circular economy as a legal instrument generates enabling conditions, but performance is consolidated only when such conditions are converted into effective technical operations, confirming that regulation without operational mediation does not produce tangible results, a pattern documented by Di Foggia and Beccarello [
37] in European municipal systems.
The triangulation between the explanatory and predictive components of the hybrid pipeline reveals partial convergences that enrich interpretation and, at the same time, expose limitations that should be read in methodological terms rather than as analytical failure. The SHAP analysis identifies medium district size (D2) as the variable with the highest absolute contribution to the prediction of participation in training activities (D3), followed by latent scores C1 and C2, whereas C4 and especially C3 show lower contributions. This ranking should be interpreted cautiously and in exploratory terms, given the weak discriminatory performance of the classifier. Rather than confirming the structural hierarchy identified by the explanatory model, the SHAP profile suggests that the fitted classifier distributed relative importance across contextual and latent predictors in a way that only partially overlaps with the structural sequence. In particular, the prominence of C1 indicates that perceived regulatory coherence may be more closely associated with the disposition to engage in training processes than with performance itself. This complementarity between approaches is precisely what Sharma et al. [
19] anticipated when proposing hybrid SEM-machine learning pipelines in which causal structure and algorithmic prediction illuminate different dimensions of the same phenomenon, and it is consistent with the analytical architecture validated by Rashad et al. [
42] by integrating PLS-SEM, XGBoost, and SHAP to reveal relationships that no isolated approach can capture. The test-set AUC-ROC of 0.500 indicates weak discriminatory performance, which requires cautious interpretation of the predictive component. In this context, SHAP values are more appropriately understood as an exploratory interpretive aid for examining how the fitted classifier distributed relative importance across predictors, rather than as robust evidence of stable predictive dominance [
20,
43].
AVE values below 0.50 across the four constructs (C1 = 0.352, C2 = 0.432, C3 = 0.390, and C4 = 0.378) constitute a limitation that requires a contextualized reading rather than a mechanical rejection of the instrument. Müller et al. [
40] emphasized that the assessment of convergent validity should consider the semantic heterogeneity of items and the presence of reverse-worded indicators, precisely the situation documented here: the items with the suffix “_5” in each construct, phrased in a problematizing sense (regulatory confusion, staff turnover, lack of training, absence of progress), exhibit systematically low loadings (0.281–0.393), which depress aggregate AVE without invalidating conceptual content. This tension between achievement-oriented items and barrier-oriented items, paradoxically, provides a valuable substantive interpretation: the instrument does not merely measure capacities, but also critical implementation bottlenecks, thereby capturing a semantic duality inherent to municipal management in contexts of high institutional precariousness such as that documented by Marín-Cabanillas et al. [
12] for Cajamarca.
Hair et al. [
14] warned that the rigid application of psychometric thresholds in exploratory research may penalize instruments that capture emerging constructs in novel contexts, a criterion that is particularly relevant when municipal circular economy is modeled for the first time in Peru. The overlap of latent score distributions between the D3 classes, visible in the diagnostic outputs, explains the classifier’s modest discrimination and is consistent with the argument of Shmueli et al. [
18] regarding the distinction between explanatory power and predictive power: a model may capture robust structural relationships while generating limited predictions when the target variable depends on exogenous factors not included in the model, such as the territorial availability of training opportunities or contingent administrative decisions.
The convergence between both components of the hybrid pipeline makes it possible to derive implications that go beyond mere statistical validation. The prominence of medium district size in the SHAP ranking suggests an institutional inflection point: intermediate-scale municipalities may possess sufficient organizational structure to access training processes while simultaneously exhibiting operational gaps that motivate them to actively demand training in circular economy, a pattern consistent with the heterogeneity of municipal capacities documented by Awino et al. [
26] across 132 countries and with the evidence provided by Wilson et al. [
29] regarding differentiated institutional weakness in local governments of the Global South. From a regulatory perspective, the fact that C1 emerges as the second SHAP predictor while its direct effect on C4 is virtually null in the structural model confirms that the perception of regulatory coherence operates as an enabler of processes—training, institutional strengthening, and demand for circularity—without automatically translating into performance, a finding that reinforces the implementation gap identified by Agovino et al. [
25] between formal regulatory frameworks and effective environmental outcomes. Taken together, the pipeline results support the thesis that municipal circular economy requires a multilevel architecture in which regulation enables, institutions organize, and operations execute [
23], and that the evaluation of this chain demands methodologies combining theoretical validation, predictive capacity, and algorithmic interpretability [
41,
44]. For Cajamarca, where 81.9% of waste is disposed of in open dumps [
12], these findings orient public policy toward operational strengthening as the primary lever, conditioned by the simultaneous improvement of institutional capacity and regulatory coherence.
6. Conclusions
The hybrid PLS-SEM, XGBoost, and SHAP pipeline applied to 120 observations from Cajamarca made it possible to examine a theoretically grounded causal architecture and to complement it with exploratory predictive assessment and algorithmic interpretability. Regarding the first objective, the measurement model provided partial empirical support for the conceptual structure of the legal-regulatory framework (C1), institutional capacity (C2), operational management (C3), and performance (C4), although AVE values below 0.50 indicate limited convergent strength under the current instrument specification. Regarding the second objective, the structural results suggest that operational management plays the central role in linking regulation and institutional capacity to performance, given the high C3→C4 coefficient (β = 0.817) and the weak direct paths from C1 and C2 to C4. Regarding the third objective, the XGBoost classifier showed modest test-set discrimination (AUC-ROC = 0.519), indicating that training participation is only partially captured by the modeled constructs. Finally, the SHAP analysis identified district size and latent scores C1, C3, and C2 as the most relevant contributors to classification, suggesting that participation in training is embedded in a phase of institutional-operational strengthening rather than being a simple consequence of already consolidated performance. Overall, the findings support a sequential interpretation in which regulation enables, institutions organize, and operations execute, while also indicating that the present hybrid pipeline is more informative for explanatory integration and variable-importance analysis than for strong out-of-sample prediction under the current specification.
This study has some limitations. First, the analytical specification did not include a dedicated technological dimension, such as waste recovery rates, treatment efficiency, infrastructure modernization, or operating-cost indicators. As a result, the model is better interpreted as an assessment of the governance and operational conditions associated with circular economy performance than as a full technical evaluation of municipal waste systems. Future research should integrate technological and engineering indicators with latent governance constructs in order to test whether municipal performance is jointly shaped by institutional organization and technological capability.
From a practical standpoint, the findings suggest that local decision makers should prioritize three lines of action. First, municipalities should strengthen operational management through updated plans, standardized procedures, monitoring systems, and continuous staff training. Second, institutional capacity should be reinforced through better interdepartmental coordination, technical staffing continuity, and stable organizational support for circular-economy programs. Third, regulatory efforts should focus not only on formal compliance, but also on improving local enforceability and implementation coherence. These actions may help reduce the persistent gap between formal circular-economy policy and actual municipal waste-management performance in Cajamarca