Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsABOUT
This study presents a large-scale, empirical approach to quantitatively assess cyber risk in local public administrations using supervised machine learning (ML) algorithms. The paper analyzes a comprehensive dataset of 93 technological and organizational variables collected from over 7,000 municipalities of Spain, aiming to identify the key factors influencing cyber risk. A composite risk indicator, called CIORank, is developed and validated as a tool to measure and compare security postures across entities. The research challenges conventional assumptions (particularly the belief that higher IT budgets lead to lower risk) by revealing that geographical location is a significant predictor of cyber risk. The findings highlight non-obvious relationships between organizational characteristics and cybersecurity outcomes, offering a data-driven framework that enables more objective, transparent, and effective allocation of resources and risk management strategies in the public sector.
TITLE
"Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities".
I would suggest improving the title. Since "The findings of this study demonstrate that geographical factors are a key predictor of cyber risk",
I would remove "Beyond Geography" from the title. I would propose something about "Geography above Budget".
Perhaps something like :
"Machine Learning analysis of cyber risk (optional: estimation) in the external perimeter of local public entities shows that geography matters more than budget", OR
"Machine learning shows that geography outweighs budget in cyber risk (optional: estimation) for local public entities’ external perimeter".
TABLES
Tables 2, 3, 4, 5 miss some horizontal lines.
FORMATTING
Check line spacing in References.
ENGLISH: GRAMMATICAL ERRORS, SYNTACTICAL ERRORS
The paper will benefit from proof-reading. For instance:
1/ Line 12: "Due to their vast number and heterogeneity, local public administrations represent one of the most significant threat vectors to national cybersecurity."
This implies that local public administrations represent a threat vector to national cybersecurity, but the intended meaning is that
local public administrations represent vulnerable targets. Local governments are not the threat, but rather weak links that can be exploited.
So I would propose something like:
a) "Due to their vast number and heterogeneity, local public administrations represent one of the most significant vulnerabilities in national cybersecurity." OR:
b) "Due to their vast number and heterogeneity, local public administrations can act as entry points (or attack surfaces) for adversaries targeting national infrastructure."
2/ Line 214: Complete information related with the variables are available in Appendix A.2. ==>
Complete information related to the variables is (the information) available in Appendix A.2.
3/ L144: "leveraging this data" vs. L257: "The 2024 data were designated":
- Do you consider "data" singular or plural?
4/ L 343: The significance of this (which?) variable is illustrated in Figure 3: Which variable ? Do you mean:
The significance of each variable is illustrated in Figure 3? OR
The significance of these variables is illustrated in Figure 3?
ENGLISH: TYPING ERRORS, PUNCTUATION ERRORS
There are few typing errors. For instance:
L824: Lasso Least Angle Regression (llar) [45]. ; ==>
Lasso Least Angle Regression (llar) [45].
ACRONYMS
There are undefined acronyms.
Pls define SEO. Is it Search Engine Optimization? SEO plays an important role in this study. It could also be added to the keywords.
METHODOLOGY
Data Acquisition and Processing, methodology, Modelling with Machine Learning and results are fine.
Remarks:
1/ "official documents can be sold for substantial amounts, such as $60 for email accounts" : Compared to the other examples, this is not a good example of substantial amounts; either replace it or omit it.
2/ technical SEO best practices. Pls explain why technical SEO best practices and response times were used as a security variable. How is SEO related to security?
3/ the Definition of the Risk Metric: CIORank is not fully justified and seems a little arbitrary to me.
4/ (If SEO = Search Engine Optimization):
Search Engine Optimization (SEO) is not a direct measure of development quality or maintenance for the following reasons:
1. SEO can be "gamed" (hacked).
Poorly developed websites with weak code, security flaws, or broken functionality can still rank well through manipulative tactics (e.g., keyword stuffing, link farms). This shows high SEO performance ≠ high quality.
2. Development quality includes factors invisible to SEO
Code readability, architecture, security, scalability, testing practices, CI/CD pipelines — these are core aspects of development quality, but they don’t directly affect SEO.
3. SEO ≠ (is not equal to) software engineering excellence
A beautifully engineered backend system might power a site with terrible SEO (e.g., poor metadata, slow content delivery to crawlers), and vice versa.
4. Maintenance ≠ SEO updates
Real maintenance includes patching vulnerabilities, updating dependencies, refactoring code. SEO "maintenance" often means updating keywords or meta tags — surface-level changes.
CONCLUSION
The conclusions were supported by the data.
REFERENCES
The references are adequate, focused and up-to-date.
Author Response
The reviewer has provided a series of comments regarding the format and some grammatical errors. We have indeed been able to improve these points and are very grateful for that.
Regarding some suggestions related to improving readability, the reviewer provides us with the following points to confirm:
Comment 1:
1/ Line 12: "Due to their vast number and heterogeneity, local public administrations represent one of the most significant threat vectors to national cybersecurity."
This implies that local public administrations represent a threat vector to national cybersecurity, but the intended meaning is that
local public administrations represent vulnerable targets. Local governments are not the threat, but rather weak links that can be exploited.
So I would propose something like:
a) "Due to their vast number and heterogeneity, local public administrations represent one of the most significant vulnerabilities in national cybersecurity." OR:
b) "Due to their vast number and heterogeneity, local public administrations can act as entry points (or attack surfaces) for adversaries targeting national infrastructure."
Response 1:
We thank the reviewer for this insightful observation to improve the precision of the text. We completely agree that the original wording was ambiguous and that the correct interpretation is that local administrations are vulnerable targets (weak links), not the threat vector itself.
To reflect this, we have revised the text as follows:
- Original Text: "Due to their vast number and heterogeneity, local public administrations represent one of the most significant threat vectors to national cybersecurity."
- Revised Text: "Due to their vast number and heterogeneity, local public administrations can act as entry points (or attack surfaces) for adversaries targeting national infrastructure."
We believe the new wording captures the point raised by the reviewer much more accurately.
Comment 2:
2/ Line 214: Complete information related with the variables are available in Appendix A.2. ==>
Complete information related to the variables is (the information) available in Appendix A.2.
Response 2:
We thank the reviewer for the detailed grammatical correction. We have updated the text in the manuscript as suggested to improve its clarity and precision.
Comment 3:
3/ L144: "leveraging this data" vs. L257: "The 2024 data were designated":
- Do you consider "data" singular or plural?
Response 3:
It is true that in the document we referred to information and that it is considered singular and uncountable. After review by the authors, we left that singular reference when it should be plural (Plural of Datu- -> “these data”)
Comment 4:
4/ L 343: The significance of this (which?) variable is illustrated in Figure 3: Which variable ? Do you mean:
The significance of each variable is illustrated in Figure 3? OR
The significance of these variables is illustrated in Figure 3?
Response 4:
We thank the reviewer for pointing out this lack of clarity. We agree that the original sentence was ambiguous.
To resolve this, we have rewritten the sentence to be much more specific and to accurately reflect the content of Figure 3:
“‘Figure 3 illustrates the significance of the ten variables with the highest importance in the model.”
This indicates that it is not all 93 variables, but rather the top 10 that have the greatest impact.
Comment 5:
There are few typing errors. For instance:
L824: Lasso Least Angle Regression (llar) [45]. ; ==>
Lasso Least Angle Regression (llar) [45].
Response 5:
We thank the reviewer for their attention to detail. This punctuation error has been corrected. Furthermore, we have reviewed all lists throughout the manuscript to ensure consistency, replacing the final semicolon with a period as suggested.
Comment 6:
There are undefined acronyms.
Pls define SEO. Is it Search Engine Optimization? SEO plays an important role in this study. It could also be added to the keywords.
Response 6:
We agree. The definition of the acronym SEO has been added, as well as CERT and CCN-CERT.
Comment 7:
Data Acquisition and Processing, methodology, Modelling with Machine Learning and results are fine.
Remarks:
1/ "official documents can be sold for substantial amounts, such as $60 for email accounts" : Compared to the other examples, this is not a good example of substantial amounts; either replace it or omit it.
Response 7:
We thank the reviewer for this insightful observation. We agree that the "$60" figure, presented without context, does not adequately illustrate a "substantial amount." Our intention was to highlight that while the individual value is low, the risk becomes substantial at scale. We changed it.
Comment 8:
2/ technical SEO best practices. Pls explain why technical SEO best practices and response times were used as a security variable. How is SEO related to security?
Response 8:
Please review answer 10 as it is related.
Comment 9:
3/ the Definition of the Risk Metric: CIORank is not fully justified and seems a little arbitrary to me.
Response 9:
We thank the reviewer for pointing out the need for a more robust justification of the CIORank metric. To address this concern, we have strengthened our manuscript by clarifying the rationale behind both the selection and the weighting of the metrics.
- Justification for Metric Selection: To show that the choice of metrics is not arbitrary, we have aligned it with the standard phases of a penetration test, as defined by the NIST SP 800-115 framework. The new Appendix A.4 details how our Security, Availability, and SEO metrics systematically cover the critical phases of a simulated cyberattack, from Reconnaissance and Information Gathering to Vulnerability Analysis.
- Justification for Metric Weighting: We acknowledge that a simple arithmetic mean (equal weighting) could appear arbitrary. For this reason, we conducted a comprehensive sensitivity analysis (detailed in Appendix C.1), testing different weighting schemes based on various strategic priorities. The results confirm that the study's main conclusions are robust and not dependent on the initial equal-weighting scheme.
We believe these additions to the manuscript provide a much more thorough and rigorous justification for the CIORank methodology, fully addressing the reviewer's concern.
Comment 10:
4/ (If SEO = Search Engine Optimization):
Search Engine Optimization (SEO) is not a direct measure of development quality or maintenance for the following reasons:
1. SEO can be "gamed" (hacked).
Poorly developed websites with weak code, security flaws, or broken functionality can still rank well through manipulative tactics (e.g., keyword stuffing, link farms). This shows high SEO performance ≠ high quality.
2. Development quality includes factors invisible to SEO
Code readability, architecture, security, scalability, testing practices, CI/CD pipelines — these are core aspects of development quality, but they don’t directly affect SEO.
3. SEO ≠ (is not equal to) software engineering excellence
A beautifully engineered backend system might power a site with terrible SEO (e.g., poor metadata, slow content delivery to crawlers), and vice versa.
4. Maintenance ≠ SEO updates
Real maintenance includes patching vulnerabilities, updating dependencies, refactoring code. SEO "maintenance" often means updating keywords or meta tags — surface-level changes.
Response 10:
We thank the reviewer for this detailed and insightful critique of the "SEO" metric category. We agree entirely that traditional, marketing-oriented SEO metrics (such as keyword ranking or backlink strategies) are indeed poor and often misleading proxies for development quality and maintenance, for all the valid reasons outlined.
We acknowledge that our use of the general label "SEO" has caused this ambiguity. In our study, this category does not refer to search ranking optimization but to a broader set of technical metrics derived from automated auditing tools (e.g., Google Lighthouse), which assess front-end code quality, performance, and adherence to web standards. We use these as a proxy for technical robustness and maintenance diligence. For instance:
- SEO3_accessibility requires adherence to WCAG guidelines, enforcing proper HTML semantics and input validation, which indirectly reduces coding errors and potential security flaws.
- SEO3_best_practices directly scans for insecure JavaScript libraries, deprecated APIs, and other code-level vulnerabilities visible from the front-end.
- SEO3_performance assesses the service's efficiency, reflecting the resilience of the underlying infrastructure to traffic spikes.
We also concur with the reviewer that crucial aspects of software quality, such as backend architecture or CI/CD pipelines, are invisible to our external audit methodology. This is a deliberate scoping choice of our research, which focuses exclusively on the external perimeter to ensure massive scalability without requiring the deployment of internal probes.
To address this valid concern and improve the clarity of the manuscript, we have added an explanatory note to the definition of the SEO indicator, expressly stating that it refers to “Web quality and performance”. We believe that this new label better reflects the technical nature of the underlying metrics and resolves the ambiguity pointed out by the reviewer.
---------------------------------------
We would like to thank the reviewer for their comments. They have been very constructive and well-guided in improving the article and raising it to a higher level.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript titled “Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities” presents a nationwide study of approximately 7,000 Spanish municipalities. Using 93 technological and contextual variables collected over three years, the authors propose a composite indicator (CIORank) and employ supervised machine learning models (classification and regression) to predict cyber risk exposure. The key findings are that geographical factors, rather than IT budgets, are significant predictors of cyber risk, challenging conventional assumptions. The proposed framework aims to provide policymakers with a transparent, data-driven tool for cybersecurity investment and resource allocation.
Suggestions for Improvement
- Strengthen the empirical justification of the CIORank threshold by incorporating simulated attacks, third-party datasets, or expert elicitation.
- Expand the discussion on the generalizability of results, potentially by outlining a roadmap for replication in other EU member states.
- Conduct a sensitivity analysis of the weighting of CIORank subcomponents to verify the robustness of the results.
- Provide a more detailed comparison of CatBoost with simpler, more interpretable models, particularly in the policy-making context where explainability may be valued over small accuracy gains.
- Explore the role of geographic/contextual variables beyond statistical correlations, for example, by linking them to infrastructure disparities, shared services, or governance structures.
- Propose concrete pathways for building a centralized cyber incident repository (e.g., through ENISA or national CERTs) to address data limitations.
- Streamline the manuscript by moving some of the methodological descriptions to appendices, emphasizing instead the conceptual and policy contributions.
Author Response
Comment 1:
Strengthen the empirical justification of the CIORank threshold by incorporating simulated attacks, third-party datasets, or expert elicitation.
Response 1:
We thank the reviewer for their important comment regarding the justification of the CIORank threshold. We agree that an empirically derived threshold requires robust validation to ensure the findings are not arbitrary.
While other valuable methodologies like large-scale simulated attacks or expert elicitation present significant logistical challenges given the scope of 7,000 entities, we have incorporated them as future lines of research.
To directly address the reviewer's concern in the present work, we have conducted a comprehensive sensitivity analysis on the risk threshold, evaluating the model's performance under three scenarios (55%, 60%, and 65%). The full results have been added to a new Appendix C.2.
The key conclusions from this analysis are:
- Model Robustness: The model's performance (as measured by AUC) remains exceptionally high and stable across all thresholds, demonstrating that its predictive power is not dependent on a specific value.
- Optimal Trade-off: The 60% threshold represents the most pragmatic balance between Precision and Recall. A lower threshold (55%) would increase false alarms, while a higher threshold (65%) would increase the risk of missing vulnerable entities (false negatives).
Therefore, this analysis validates that the 60% threshold, in addition to being aligned with the empirical evidence from reported attacks. We sincerely thank the reviewer for this suggestion, as it has allowed us to strengthen the manuscript with this additional validation.
Comment 2:
Expand the discussion on the generalizability of results, potentially by outlining a roadmap for replication in other EU member states.
Response 2:
We thank the reviewer for this excellent suggestion. We agree that the generalizability of the results is a fundamental point, and we have expanded the Discussion section to outline a roadmap for replicating this study in other EU member states.
Our methodology is based on two types of variables with different levels of generalizability:
- Technological Metrics: These are universally applicable. The tools and standards for measuring external perimeter security (SSL quality, known vulnerabilities, etc.) are global, making this part of the framework directly replicable in any country.
- Competency (Contextual) Metrics: This is where the main challenge and the key to a successful adaptation lies. Replication in another country would require a preliminary study to identify and map equivalent government data sources (e.g., national statistics institutes, finance ministries). While complex, this process would enable the discovery of supranational correlations and provide insights into how different regional governance structures (such as the Länder in Germany or the regions in France) influence risk.
Therefore, while our study focuses on Spain, its main strength is that it serves as a validated, national-scale methodological template. It demonstrates the feasibility of a data-driven approach for a problem that has largely been addressed qualitatively and offers a clear blueprint for its adaptation to other national contexts.
Comment 3:
- Conduct a sensitivity analysis of the weighting of CIORank subcomponents to verify the robustness of the results.
Response 3:
We thank the reviewer for this valuable suggestion to strengthen the robustness of our results. We have performed the requested sensitivity analysis, and the complete methodology and findings have been added to a new Appendix C.1.
The analysis confirms that the study's main conclusions are robust and not an artifact of the initial equal-weighting scheme. The key findings are:
- Model Performance Stability: The model's predictive power (R²) remains extraordinarily stable across all scenarios, with scores ranging from 0.8438 to 0.8536. This demonstrates that the model's ability to explain risk is not dependent on the specific weighting of the subcomponents.
- Key Predictor Stability: The province variable consistently remains one of the most important predictors in all tested scenarios, ranking within the top 7 most influential features.
This analysis therefore validates that the relevance of the province variable as a proxy for supra-local governance structures is a solid and persistent finding, regardless of the strategic priority given to the Security, Availability, or SEO - Web Quality & Performance metrics.
Comment 4:
- Provide a more detailed comparison of CatBoost with simpler, more interpretable models, particularly in the policy-making context where explainability may be valued over small accuracy gains.
Response 4:
We completely agree that in a public policy context, a model's interpretability can be as crucial as its accuracy. To address this point, we have made two significant improvements to the manuscript:
- We have expanded the Discussion section to explicitly analyze the trade-off between the accuracy of the CatBoost model (R² = 0.8457) and the interpretability of simpler yet robust models like K-Neighbors Regressor (R² = 0.8106). We propose a "dual-use" approach where each model is applied depending on the objective: maximum accuracy for prediction, or maximum explainability for policy simulation.
- We have added a new Appendix D ("Interpretability of ML algorithms"), which classifies all evaluated algorithms by their level of interpretability (from 'white-box' to 'black-box'), providing a clear resource for readers to understand the characteristics of each model.
We believe that these additions not only address the reviewer's suggestion but also substantially enrich the paper by offering a more nuanced and practical perspective for the audience interested in the application of these models.
Comment 5:
- Explore the role of geographic/contextual variables beyond statistical correlations, for example, by linking them to infrastructure disparities, shared services, or governance structures.
Response 5:
We thank the reviewer for this suggestion, which addresses the core of our findings' interpretation. It is important to explain the role of the province variable beyond statistical correlation. We have addressed in two stages:
- In a previous study, we qualitatively established that the province variable acts as a proxy for supra-local governance structures ("Diputaciones") and their shared service models, a finding that aligns with ICMA recommendations.
- The current study was designed to quantitatively validate that hypothesis. By confirming through Machine Learning that province is a high-impact predictor, our model validates the critical importance of these governance models.
We have incorporated this full explanation into the Discussion section to provide the requested contextualization and to clarify that the model is, in fact, capturing the effect of these institutional structures. As a next step, as noted in our "Future Work" section, we will explore how to decompose this institutional factor into more granular metrics of shared services.
Comment 6:
- Propose concrete pathways for building a centralized cyber incident repository (e.g., through ENISA or national CERTs) to address data limitations.
Response 6:
We agree that proposing concrete pathways for building an incident repository strengthens the practical implications of our study. Following this recommendation, we have expanded the Discussion section to outline a two-tiered roadmap to address this data limitation:
- At the National Level: We propose that national CERTs (such as CCN-CERT in Spain) lead the centralization of incident reports from local administrations, using a standardized and anonymized format that is accessible to the research sector.
- At the European Level: We suggest that ENISA take a leadership role in developing a unified platform and incident taxonomy, in line with the NIS2 directive, to create a European-level risk observatory.
We believe this more detailed proposal not only addresses the reviewer's comment but also provides clear, actionable recommendations for policymakers.
Comment 7:
- Streamline the manuscript by moving some of the methodological descriptions to appendices, emphasizing instead the conceptual and policy contributions.
Response 7:
We agree that streamlining the methodology section helps to emphasize the study's conceptual contributions.
We have revised the "Materials and Methods" section to focus on the overall methodological design, moving more specific technical details to the appendices. Specifically, the following elements have been moved:
- The exhaustive list of all evaluated Machine Learning algorithms (now in Appendix B.5).
- The detailed guide for interpreting SHAP diagrams (now integrated into Appendix B.6).
---------------------------------------------
We thank the reviewer for their valuable suggestions, which have allowed us to significantly improve the quality of the paper.
Reviewer 3 Report
Comments and Suggestions for Authors1. The prominence of "province" in SHAP analysis demands deeper contextualization. Does this reflect infrastructural disparities (e.g., rural broadband access), regional threat landscapes, or institutional factors? Correlating provinces with external indices (e.g., ENISA regional risk scores or digital infrastructure databases) would strengthen causality claims.
2. The CIORank metric focuses exclusively on web-facing assets (SSL, SEO, availability), overlooking high-impact vectors like phishing, cloud misconfigurations, or third-party breaches. The authors should explicitly bound claims to "digital service perimeter risks," or address interactions between external vulnerabilities and internal threats (e.g., credential theft).
3. Reliance on media-reported attacks for the CIORank <60 threshold may underrepresent low-severity incidents. Sensitivity analysis (e.g., FPR/FNR trade-offs at thresholds 55–65) and comparison with breach databases (if available) are recommended.
4. The Spanish context limits direct applicability. Discussion should address: (1) Transferability to centralized (e.g., French) vs. federated (e.g., German) systems, (2) Adaptation requirements for non-EU contexts (e.g., county-based governance), (3) Impacts of excluded autonomous regions (Navarre/Basque Country)
5. Correlations alone cannot distinguish whether "Province" proxies unmeasured confounders (e.g., vendor quality). Temporal analysis (e.g., Granger causality using year-over-year data) could mitigate this.
Author Response
Comment 1:
- The prominence of "province" in SHAP analysis demands deeper contextualization. Does this reflect infrastructural disparities (e.g., rural broadband access), regional threat landscapes, or institutional factors? Correlating provinces with external indices (e.g., ENISA regional risk scores or digital infrastructure databases) would strengthen causality claims.
Response 1:
We are in full agreement that the province variable acts as a proxy. Our research indicates that the primary factor it captures is institutional factors. “Diputaciones” (Provincial Councils) offer shared IT and cybersecurity services, acting as the "common quality standard" or the "institutional confounder" that homogeneously influences the risk of the municipalities within their territory. To explore infrastructure disparities, we propose correlating our findings with digital infrastructure databases, should they be publicly available. We have added more context in response 5.
Comment 2:
The CIORank metric focuses exclusively on web-facing assets (SSL, SEO, availability), overlooking high-impact vectors like phishing, cloud misconfigurations, or third-party breaches. The authors should explicitly bound claims to "digital service perimeter risks," or address interactions between external vulnerabilities and internal threats (e.g., credential theft).
Response 2:
We completely agree that it is important to clearly define the scope of the CIORank metric and its relationship with other attack vectors. Following your suggestions, we have taken two main actions in the manuscript to address this point:
- We have bounded the claims: We have revised the entire manuscript to ensure we consistently use precise terms such as "external perimeter risk" instead of general "cyber risk," thereby clarifying the scope of our conclusions.
- We have added a discussion on the interaction: We have expanded the "Limitations" section to explicitly address the interaction between the external vulnerabilities we measure and the internal threats you mentioned. The new text reads as follows:
"It is important to acknowledge that the CIORank metric is designed to address external perimeter vulnerabilities... However, the security flaws identified by CIORank frequently function as initial entry points for more complex attacks. For instance, an unpatched vulnerability on a public web server... could serve as the initial entry point for an attacker to collect credentials, which are then used to breach internal systems..."
This decision to focus on the external perimeter is based on the feasibility of conducting a large-scale analysis across 7,000 entities, an insurmountable logistical challenge with internal probes.
We believe these improvements, prompted by your comment, clarify the scope and value of our research much more effectively.
Comment 3:
- Reliance on media-reported attacks for the CIORank <60 threshold may underrepresent low-severity incidents. Sensitivity analysis (e.g., FPR/FNR trade-offs at thresholds 55–65) and comparison with breach databases (if available) are recommended.
Response 3:
We agree that media-reported attacks may underrepresent certain types of incidents. Currently, no comprehensive, public breach repositories exist for this sector—a limitation our work aims to begin mitigating by publishing our anonymized data for future research.
Following your valuable recommendation, we have performed a full sensitivity analysis of the threshold, and the detailed results have been added to the new Appendix C.2. This analysis validates the robustness of choosing 60% as an optimal trade-off:
- Model Stability: The model's predictive performance (as measured by AUC) remains high and stable across all three tested thresholds (55%, 60%, and 65%), confirming that the model is not brittle to small variations.
- Optimal Trade-off: The 60% threshold represents the best balance between false positives and false negatives. It avoids the excess of false alarms that a more sensitive approach (55%) would generate, and the risk of missing vulnerable cases that a more lenient approach (65%) would entail.
Your suggestion was key to strengthening the justification for our methodology. We sincerely thank you for this contribution, as it has allowed us to demonstrate the robustness of our threshold choice much more clearly.
Comment 4:
- The Spanish context limits direct applicability. Discussion should address: (1) Transferability to centralized (e.g., French) vs. federated (e.g., German) systems, (2) Adaptation requirements for non-EU contexts (e.g., county-based governance), (3) Impacts of excluded autonomous regions (Navarre/Basque Country)
Response 4:
We agree this is a fundamental point and have expanded the Discussion section to explicitly address the three aspects raised.
Our response is structured around the core idea that while the technological metrics are universally applicable, the contextual metrics require deliberate adaptation, for which our study serves as a methodological blueprint.
- Transferability to Centralized (e.g., French) vs. Federated (e.g., German) Systems: We have clarified in the discussion that the province variable in our model acts as a proxy for regional governance structures (“Diputaciones”). Therefore, in a federated system like Germany, we would expect an analogous variable such as the "Land" (state) to have high predictive power. In a centralized system like France, the relevance of regional variables might decrease in favor of indicators related to national policies or state aid.
- Adaptation for Non-EU Contexts (e.g., County-Based Governance): The methodology is adaptable. In a context like the United States, the analysis could be replicated by replacing the province with the administrative level where shared services are consolidated, which could be the "county" or even the "state", thus validating the corresponding local governance model.
- Impact of Excluded Autonomous Regions (Navarre/Basque Country): We have explicitly acknowledged this limitation. These regions were excluded from the budget correlation analysis due to their unique economic frameworks ("fueros"), which prevents a homogeneous comparison using national data sources. While they represent less than 6% of municipalities, we clarify that this decision was necessary to maintain methodological rigor.
We believe that our study, while focused on Spain, establishes a robust and adaptable model for quantifying external perimeter risk in any country, once the mapping of its specific contextual variables is performed.
Comment 5:
- Correlations alone cannot distinguish whether "Province" proxies unmeasured confounders (e.g., vendor quality). Temporal analysis (e.g., Granger causality using year-over-year data) could mitigate this.
Response 5:
We would like to express our gratitude to the reviewer for their valuable insight regarding potential unmeasured confounders.
Our central hypothesis, supported by our prior work, is that the province variable acts precisely as a proxy for the primary institutional confounder: the governance and shared-service model of the “Diputaciones” (Provincial Councils).
As the reviewer correctly suggests, a common “vendor quality” is a potential confounder. In our context, the “Diputación” functions exactly in this role, establishing a common quality standard and providing shared technological services that homogeneously influence the municipalities within its province. Our model is therefore capturing the effectiveness of this institutional factor through the province variable.
We have incorporated this explanation into the Discussion section to contextualize this finding. The valuable suggestion to use a more advanced Granger causality analysis to explore the temporal dynamics of this relationship has been added to our "Future Work" section. We sincerely thank the reviewer for this contribution, as it enriches the long-term vision of our research.
--------------------
We are grateful for the reviewer's valuable feedback. Their suggestions have been very helpful for strengthening the manuscript.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsAuthors addresed all previous comments.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have improved alot in the revision.