Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities

Sanchez-Zurdo, Javier; San-Martín, Jose

doi:10.3390/electronics14193845

Open AccessArticle

Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities

by

Javier Sanchez-Zurdo

^1,2

and

Jose San-Martín

^2,*

¹

Department of Data, Héroux-Devtek, 28906 Getafe, Spain

²

Department of Computer Architecture and Technology, Rey Juan Carlos University, 28933 Móstoles, Spain

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(19), 3845; https://doi.org/10.3390/electronics14193845

Submission received: 12 August 2025 / Revised: 18 September 2025 / Accepted: 22 September 2025 / Published: 28 September 2025

(This article belongs to the Special Issue Machine Learning and Cybersecurity—Trends and Future Challenges)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Due to their vast number and heterogeneity, local public administrations can act as entry points (or attack surfaces) for adversaries targeting national infrastructure. The individual vulnerabilities of these entities function as entry points that can be exploited to compromise higher-level government assets. This study presents a nationwide risk analysis of the exposed perimeter of 7000 municipalities, achieved through the massive collection of 93 technological and contextual variables over three consecutive years and the application of supervised machine learning algorithms. The findings of this study demonstrate that geographical factors are a key predictor of external perimeter cyber risk, suggesting that supra-local entities providing unified, shared security services are better positioned in terms of risk exposure and therefore more resilient. Furthermore, the analysis confirms, contrary to conventional wisdom, that IT budget allocation lacks a significant statistical correlation with external perimeter risk mitigation. It is concluded that large-scale data collection frameworks, enhanced by Artificial Intelligence, provide policymakers with an objective and transparent tool to optimize cybersecurity investments and protection strategies.

Keywords:

cybersecurity metrics; risk indicators; feature extraction; local government; machine learning; optimization; predictive modeling

1. Introduction

The ongoing digitization of local public administration is a double-edged sword: while it improves the efficiency of services and accessibility for citizens, it also increases the attack surface for sophisticated cyber threats. These entities act as custodians of vast and highly sensitive datasets, including personally identifiable information (PII), financial records, cadastral surveys, and confidential social services data, such as vulnerability assessments or protection orders against gender-based violence. The significant economic value of this information on the illicit markets of the dark web, where credentials, health records and official documents can be sold for substantial amounts, such as $1000 for complete medical documents or passports that can be worth around $4000 for a Maltese passport [1,2,3], creates a powerful and persistent incentive for malicious actors to target them.

This economic driver fuels a continuous cycle of offensive innovation, where threat actors constantly seek to circumvent established security protocols. The recent democratization of Artificial Intelligence (AI) has had a significant impact on this dynamic, resulting in the creation of an asymmetric advantage for adversaries. AI-powered tools now enable the automated generation of polymorphic malware, the crafting of highly convincing phishing campaigns at scale, and the development of sophisticated attacks designed for stealthy data exfiltration. Conversely, AI also provides defenders with powerful capabilities for anomaly detection, automated incident response, and proactive threat hunting through simulation [4,5,6]. Nevertheless, the reduced barriers to entry for offensive AI tools represent an escalating challenge to conventional defensive postures.

In order to protect these entities in an effective manner, a comprehensive security strategy is required that addresses both the internal perimeter, the purpose of which is to protect internal operations, and the increasingly porous external perimeter, the function of which is to provide e-government services to citizens and to integrate IoT devices. Nevertheless, a significant impediment to the formulation of such strategies is the absence of centralized empirical data. Presently, there is an absence of a consolidated public repository that documents cyber incidents, their operational consequences, and the lessons learned [7]. This has resulted in a reliance on informal evidence from media reports, which can be biased and incomplete. This data deficit prevents large-scale quantitative analysis, hindering the development of evidence-based security policies and forcing entities into a reactive, rather than proactive, defensive posture.

1.1. Related Work

1.1.1. Cyber Attack Figures

According to reports from cybersecurity companies, the average increase has been documented to be over 150%, independent of the sector affected. The prevalence of credential theft has escalated exponentially, with a 442% increase in incidents, leveraging sophisticated techniques such as social engineering and vishing. An analysis of security breaches reveals that 83% of such incidents are attributable to web applications that have been compromised by the use of stolen credentials. It is also noteworthy to mention the role of Generative AI (GenAI) techniques in accelerating the development of attacks and facilitating their automation [8].

Looking specifically at government entities, the same sources indicate that 9% of global attacks are suffered by these organizations [8], with an increase of almost 43% in recent years [9]. The results indicate that 85% of attacks come from external actors, exploiting vulnerabilities in systems, employing social engineering or stealing devices. Likewise, more than 60% of the assets attacked correspond to web systems, in an analysis carried out on a universe of 92 public sector organizations [9].

1.1.2. Cyber Threats

The primary threats facing these entities include ransomware, malware deployment, social engineering, and Distributed Denial-of-Service (DDoS) attacks, alongside misinformation campaigns aimed at disrupting civic processes. It is important to note that AI is being used to increase the severity of these threats. For example, Large Language Models (LLMs) can generate highly contextualized and persuasive narratives for social engineering attacks, making them almost impossible to distinguish from official communications [10,11].

In response to this challenge, regulatory authorities such as the European Union’s Network and Information Security Agency (ENISA) have proposed a range of conventional security controls, including backup strategies, multi-factor authentication (MFA), and information sharing [12]. While these measures are essential, they are regarded as a baseline defense. They often struggle to scale against automated, AI-driven threats and do not provide a framework for quantitatively prioritizing risks based on the specific context of an entity, thus highlighting the need for more advanced, data-driven assessment models.

1.1.3. Local Public Environment

Within the public sector, local governments are the entities where attacks have grown exponentially. However, there is an absence of a unified, comprehensive record that documents all attacks against the public sector [7]. Nevertheless, the media have documented several consequences of these incidents. In September 2023, the city council of Seville, Spain, was subject to a cyberattack, leading to the complete disruption of all public services [13]. In Washington-Pennsylvania County (USA), the entity was subjected to a data encryption attack, which subsequently compelled the entity to pay for the decryption of the affected data [14]. Local entities utilizing cloud infrastructures were also subject to attacks, as evidenced by the incident that affected thousands of Italian public entities with their provider, Westpole-PA [15,16].

The literature suggests that these entities, due to their considerable number and their provision of essential public services, are vulnerable targets. The volume of confidential information stored from businesses and citizens is substantial; however, there is a limited number of studies that have examined the risks associated with this specific public sector [17]. The International City/County Management Association (ICMA) [18] is an organization that brings together local government professionals with the aim of facilitating the prosperity of these entities on a global scale. The ICMA undertook an extensive analysis of the 71 most prevalent cybersecurity risks confronting local entities worldwide. The presence of entities with constrained financial resources, inadequate protection expertise, a lack of shared protections, and a fragmented leadership structure leaves them particularly susceptible to vulnerability [19,20]. These studies were conducted on a cohort of 14 Chief Information Security Officers (CISOs) in 78 local governments in the United States, resulting in a satisfactory initial approximation. However, these findings were based on qualitative analyses of a small cohort of US local governments. This highlights a clear gap for large-scale, quantitative research that can validate and expand upon these initial observations across an entire national ecosystem.

1.1.4. Cybersecurity Framework

As indicated by the ICMA, the sharing of risk information and strategies is fundamental to minimizing risks between local public entities. However, the establishment of shared legislative frameworks among multinational entities has the potential to ensure interoperability and uniform levels of protection. In Europe, the EU Cybersecurity Regulation [21] establishes a high level of cybersecurity, cyber resilience, and trust. This regulation establishes the ENISA and a cybersecurity certification framework for products and services [22]. The NIS2 Directive 2022/2555 [23] aims to harmonize national strategies among member states, identifying competent authorities, crisis management protocols, and single points of contact in cybersecurity. This requirement is considered fundamental for certain types of entities; however, member states are not compelled to implement it in the context of local public administration (city councils and municipalities) or research centers. The utilization of divergent protection criteria constitutes a systemic risk.

Additionally, the risks linked to the use of Artificial Intelligence are not directly covered by the NIS2, but it talks about risk management, monitoring, detection and response to incidents with technological innovation, training and awareness.

1.1.5. Massive Data in Local Public Entities

Recent research has demonstrated the feasibility of collecting extensive cybersecurity metrics from thousands of local entities at a national scale [24]. This work confirmed that supra-local coordinating bodies play a significant role in promoting shared protections, validating earlier ICMA claims. While establishing the viability of large-scale data collection is a critical first step, the primary scientific opportunity lies in making use of these data. The vast quantity and complexity of these datasets make them ideal for the application of advanced Machine Learning (ML) techniques to move beyond simple monitoring towards predictive risk modeling.

1.2. Summary of Literature and Research Objectives

In summary, the extant literature establishes that local public entities are numerous, critical, and increasingly targeted by AI-enhanced threats. The primary attack surface of these entities is the external perimeter, yet they frequently fall outside mandatory cybersecurity regulations such as NIS2. Additionally, there is a significant lack of large-scale empirical research into their risk posture, despite the proven feasibility of massive data collection frameworks.

Addressing this critical research gap, the present paper proposes a large-scale, empirical methodology to quantitatively assess cyber risk on the external perimeter of local public administration. The objective of this study is to utilize supervised ML algorithms to analyze a dataset comprising 93 technological and competency variables from over 7000 municipalities, with the aim of identifying the key variables of external cyber risk. Specifically, the development and validation of a composite risk indicator, termed CIORank, is undertaken. The primary objective of this research is to build an optimized model that moves beyond traditional assumptions, allowing us to uncover non-obvious relationships between an entity’s intrinsic characteristics and its security posture. This study provides empirical evidence demonstrating that geographical location is a significant risk factor, a finding that challenges the common assumption about the role of IT budget allocation. Consequently, this work offers a novel application based in a data-driven framework that objectifies resource allocation and risk management in the public sector.

2. Materials and Methods

The study population comprises approximately 7000 municipalities in Spain, covering a wide range of population sizes, economic capacities, and geographical contexts. Data were collected on an annual basis over a three-year period, from 2022 to 2024, with the objective of enabling both cross-sectional analysis and temporal validation of the models.

As illustrated in Figure 1, the primary tasks are:

The process of acquiring and processing captured data;
Risk definition;
Empirical justification of risk thresholds;
A correlation study is to be conducted between technological investments and the identified risk;
The generation of a qualitative model (Classification) is required to indicate whether the entity is at risk. This model should also identify the most representative risk variables;
The generation of an optimized quantitative model (Regression) is necessary to establish a numerical risk value for each entity;
Verification of the model’s suitability using data from subsequent years.

Therefore, this section details the study design, data sources, risk metric definition, and ML pipeline used to identify key cybersecurity predictors in local public administrations in Spain. Furthermore, the proposal produces two ML models that facilitate enhanced risk estimation for each entity.

2.1. Data Acquisition and Processing Framework

In order to manage the scale and heterogeneity of the data, an automated collection framework and a Medallion architecture for data processing were implemented. This architecture organizes the data into three layers of increasing quality (Bronze, Silver, and Gold), ensuring governance, traceability, and preparation of the data for analysis. Complete information on this framework is available in Appendix A.1.

2.1.1. Data Sources

A total of 93 variables were collected for each municipality, which were then classified into two main categories:

Technological variables: This information was obtained through the implementation of active scans of the exposed digital perimeter of each municipality. These scans, performed by automated scripts, measure technical aspects of public web services. The metrics encompass SSL/TLS certificate configuration, the presence of security headers, response times, known vulnerabilities (CVEs) in exposed components, and technical SEO best practices;
Competential or Contextual variables: The data has been obtained from public and government data sources, including the National Institute of Statistics (INE) and the Ministry of Finance. The aforementioned variables are indicative of each municipality, and include demographic data (population, age distribution), socio-economic data (unemployment rate, average income) and budgetary data (specific investment in IT items, total budget).
Complete information related with the variables is available in Appendix A.2.

2.1.2. Definition of the Risk Metric: CIORank

The central dependent variable in this study is CIORank, a composite indicator designed to quantify the maturity and security posture of an entity’s external perimeter [24]. The calculation of the online presence index is derived from the arithmetic mean of three normalized sub-indices (on a scale of 0 to 100), with each sub-index representing a key dimension of online presence. To validate the robustness of this equal-weighting approach, a sensitivity analysis was conducted to assess the impact of alternative, strategically focused weighting schemes, as detailed in Appendix C.1.

C I O R a n k = \frac{1}{3} \cdot S e c u r i t y + \frac{1}{3} \cdot A v a i l a b i l i t y + \frac{1}{3} \cdot S E O

(1)

Security: Aggregates defensive security metrics, such as the quality of cryptographic implementation, the absence of vulnerabilities, and the adoption of security headers;
Availability: Measures of the reliability and performance of web services, including loading speed, correct time synchronization, and the absence of domain blacklists;
SEO: Evaluates technical optimization, compliance with web standards, and accessibility, which act as a proxy for development quality and maintenance. This category is referred to as “Web Quality & Performance” to accurately describe its focus on technical audits. The underlying individual metrics retain their original SEO prefix (e.g., SEO3_performance) for consistency with the data collection tools used.

A higher CIORank is indicative of an advanced technical posture, which in turn leads to a reduction in the inherent risk within the exposed perimeter. Complete information related with the indicators and risk metric are available in Appendix A.3.

2.1.3. Empirical Threshold of Risk

In the context of classification models, the designation of an entity as “at risk” is executed through a binary process. Instead of employing an arbitrary statistical threshold, an empirical approach is adopted, with data from real incidents being utilized from actual incidents reported in the media. Table 1 shows two independent cohorts of municipalities that suffered confirmed cyberattacks in 2022 and 2023. The mean CIORank score for these entities in the year of the attack was found to be remarkably consistent. It is essential to acknowledge the fact that the present set of identified attacks does not constitute an exhaustive enumeration of all attacks that have occurred. This limitation will be addressed in the discussion section, where recommendations for improvements will be provided.

Considering the evidence presented and with a sensitivity analysis of the risk threshold described in Appendix C.2, a risk threshold of CIORank < 60 was established. This value is employed as a conservative upper limit, with any entity exhibiting a technical posture below the average observed in municipalities that were effectively compromised being designated “at risk”.

2.2. Modeling with Machine Learning

The objective of modeling is twofold: to predict the level of risk and, crucially, to identify the contextual and technological variables that are key determinants of that risk.

2.2.1. Dataset Preparation

The data from 2022 and 2023 were consolidated and used to train and evaluate the model. To ensure robust performance estimates across multiple splits, stratified k-fold cross-validation (with k = 20) was employed. The 2024 data were designated as an out-of-time validation set to assess the model’s generalization capability on entirely unseen data.

Further detailed information can be found in Appendix B, which shows the ML model flow followed between phases 1 and 3, which correspond to the preparation of the dataset.

2.2.2. Model Training and Selection

Two types of supervised models were evaluated:

Classification Models: The objective is to predict the binary risk category, which is defined as either “at risk” or “not at risk”. A comprehensive comparison of a wide range of algorithms was conducted, including tree ensembles (such as Random Forest and CatBoost), logistic regression, and Support Vector Machines (SVMs). The Area Under the ROC Curve (AUC) was identified as the primary metric for selection, given its resilience to class imbalance;
Regression Models: The primary objective is to predict the numerical value of risk. Analogous algorithms were evaluated in their regression versions. The selection metric employed was the Mean Absolute Error (MAE), a metric that lends itself to straightforward interpretation.

The model that demonstrated the greatest degree of efficacy in each task underwent a process of hyperparameter optimization. This process entailed the implementation of grid search with 20-fold cross-validation.

Further detailed information can be found in Appendix B, which shows the ML model flow followed between phases 5 and 7, which correspond to the selection of algorithms, goodness metrics, and hyperparameter optimizations.

2.2.3. Variable Importance Analysis

Once the final model was trained, the most critical step was to interpret it to identify risk factors. To achieve this objective, we employed the SHAP (Shapley Additive Explanations) technique. The SHAP values facilitate the quantification of the contribution of each variable (e.g., geographic location, investment in budget item X, number of inhabitants) to risk prediction for each individual municipality and for the population as a whole.

Comprehensive details concerning the execution of a ML project, organized by phases, and the interpretation of SHAP plot can be found in the Appendix B.

3. Results

3.1. Correlation Between Risk and IT Investment

A key issue to be resolved is whether a local public entity’s IT investments within its budget are aligned with the risks posed to its external perimeter. It is possible to carry out a basic statistical analysis using the seven economic variables and risk indicators. The non-Gaussian nature of the variables was determined through the implementation of the non-parametric Kolmogorov–Smirnov (KS) test. Therefore, Spearman or Kendall correlation coefficients were utilized. Table 2 contains further information on Spearman’s coefficients.

The calculated correlation coefficients are all approximately equal to zero, suggesting the absence of a correlative relationship between the economic variables and the risk management variables.

3.2. Qualitative Supervised Learning Model Results

The final dataset contains a total of 13,833 entries, with 93 total variables obtained for each monitored entity. Fifty-four variables have been selected where 48 are numerical and 5 categorical variables will be used for training. The remaining variable will be the one to be estimated. It should be noted that the rest of the variables up to 93 are discarded because they represent unique values of each entity, such as contact telephone number, web URL and identification codes in the different information sources.

The CatBoost Classifier model was identified, demonstrating superior performance in comparison with the other models, as evidenced in Table 3. As demonstrated in Figure 2, Accuracy achieved a score of 84.8%, while the AUC metric attained a remarkable 94.0%. Other model performance metrics also show high performance, such as Precision, Recall and F1-Score.

Discussion section will provide a detailed explanation why the CatBoost algorithm performs better in these types of situations.

3.3. Analysis of Variables in the Qualitative Model

It has been observed that the most significant variables in the model align with the technological variables, which is consistent with the expected behavior. However, incorporating competential variables has been shown to significantly enhance the model’s predictive capacity. Figure 3 illustrates the significance of the ten variables with the highest importance in the model.

Figure 4 shows the variables that dominate the ML model. The main variables in this model are the technological variables.

As can be seen, the province is a geographical variable and is at the top of the list in Figure 3 and Figure 4. In terms of the model’s predictions regarding risk, other relevant competence variables such as geo_area, outstanding_debt, female_unemployment or male_unemployment and agriculture_unemployment are less influential.

3.4. Optimized Quantitative Supervised Learning Model Results

The most significant variables within the qualitative supervised learning model (binary classification) have been utilized to formulate the optimized quantitative model, exclusive of the ten most representative variables:

S5_sslabs_scan;
SEO3_performance;
SEO4_cookies;
SEO3_seo;
SX_shodan;
A7_black_list;
province;
S3_openports;
A2_ntp;
SEO3_bestpractices.

A total of 13,833 entries have been obtained, with 10 selected variables. The model utilizes a total of 9 numerical and 1 categorical variables during the training process. The estimation process will be applied to the risk variable.

The CatBoost Regressor model is distinguished from the other models by its higher performance, as illustrated in Table 4. The MAE metrics stand out at 2.43 (the smaller the better), MSE with 10.22, RMSE with 3.19 (the smaller the better) and R² with 0.8457 (the closer to 1 the better).

The performance of the model with the predicted values and their residuals can be seen in Figure 5, and the error prediction using the R² metric in Figure 6. The representation of the residuals demonstrates that the data conforms to a Gaussian bell-shaped distribution, thereby confirming that the error density is centered on the zero value.

The error prediction representation shows the identity line close to the model’s predictions. The closer these lines are, the better the model. Dispersion is observed at the extremes of the lines (lower and higher values). However, the majority of points are concentrated along the identity line. With a coefficient determination of 0.842, the model is able to explain a significant amount of variability in the data, thus allowing favorable conclusions about the performance of the regression model to be made.

3.5. Goodness of the Quantitative Model in the 2024 Data

After training the regression model, the 2024 data are used to predict and check the goodness of fit of the model. This data was not used before. See Table 5 for model performance.

The results obtained demonstrate that there are no significant differences between the data for the years 2022 and 2023 and the data for 2024. This finding suggests that the model has generalized relatively well.

4. Discussion

Technological variables dominate the interpretation of the ML model as can be seen in Figure 3 and Figure 4. Security and SEO variables stand out, such as S5_sslabs_scan (the better the quality of the SSL certificate, the better the model score), SEO3_performance and SEO4_cookies (the higher the score, the better the model score), SX_shodan (the higher the variable score, the more negatively it affects the model score). The findings of this study indicate that the province variable is a significant factor in the estimation of risk as evidenced by its high ranking in the SHAP analysis.

However, it is improbable that geography will be a direct causal factor. Instead, the province variable is a powerful proxy for governance structures and shared service delivery. This connection is facilitated by supra-local entities known as Provincial Councils, which provide centralized IT and cybersecurity services. Municipalities affiliated with these models benefit from shared technological solutions, unified protection strategies, and expert knowledge, resulting in significantly higher and more homogeneous security postures. In contrast, entities not affiliated with this homogenization model often lack the resources or expertise necessary to achieve the same level of protection. Therefore, the machine learning model has not identified a new phenomenon but has quantitatively confirmed the critical impact of these governance and shared service delivery models. This finding aligns directly with the recommendations of organizations such as ICMA, which advocate for shared protection strategies to improve the resilience of local governments [19,20].

Intuitively, other competency or context variables may be posited as potential contributors to risk, including cultural differences and political management. However, the current study has not incorporated these variables, as their preliminary analysis is necessary to avoid the biases inherent to differences in political party management or cultural identifications specific to a region. While these variables are of interest, it is important to acknowledge that they require special treatment that may ultimately lead to future work focusing on the objectification of these variables.

Regarding the correlation between IT budgets and associated external perimeter risk in local public entities, previous studies on this relationship have been analyzed. Research has been conducted on the correlation between budget size and cyber losses, with a strong positive correlation being identified [25]. Despite the apparent discrepancy between the two findings, they are actually complementary for the following reasons:

Risk vs. cyber losses can be viewed as a cause-effect relationship. The presence of a risk to an entity does not directly imply an economic loss until the occurrence of a cyber security event;
The same research states that they have observed a weakening of the correlation between budget size and cyber losses. The authors of the study have provided a justification for this phenomenon, attributing it to an increase in attacks on smaller entities with limited financial resources.

In view of the fact that the principal providers of cybersecurity solutions have corroborated the almost exponential increase in attacks on any type of local public entity, and that the extant literature has already indicated that these correlations have been weakening, it can be affirmed that the results obtained are aligned: The correlation between an increase in economic items within the external perimeter and an increase in perceived security is not guaranteed. It has been determined that entities with acceptable levels of perimeter security are in a position to focus their economic efforts on alternative activities, where a greater impact can be achieved. Such activities may include the following: the training of employees, the homogenization of systems with supra-local entities, or investment in the internal perimeter. Political and technical management is key when it comes to managing the type of investment, its amount, and its distribution among employees and citizens. This would be consistent with the findings indicated by ICMA in its studies of local entities in the USA.

Furthermore, the superior performance of certain machine learning (ML) algorithms in this study can be attributed to specific factors related to the dataset and the training process. The set of variables collected includes a significant number of numerical variables, in addition to a variety of categorical variables. Within the domain of training, two distinct categories of risks have been identified: “target leakage” and “prediction shift”. The “target leakage” risk materializes when the training process has access to the target variable that would not be available at the time of prediction. This phenomenon enables the development of models that exhibit high performance during the training phase, yet demonstrate suboptimal performance during the production phase. This results in a scenario of overfitting. The “prediction shift” risk materializes when one attempts to calculate statistics on categorical variables in the training dataset. It has been demonstrated that CatBoost performs optimally with this category of problem with heterogeneous data, as it permits complex interactions between categorical variables (province) to be captured and minimizes overfitting. However, it should be noted that this approach entails a higher time cost during the training phase [26,27].

With regard to the implications of the study, it is evident that political leaders and policymakers are the primary users for the protection of national security [28,29,30]. This methodology would enable them to make investment and organizational decisions based on objective data. However, for such data-driven approaches to be adopted, it is vital that the algorithms used for risk estimation are not only accurate but also transparent, auditable, and interpretable [29,31,32]. This research deliberately prioritized machine learning models over black-box approaches like deep learning, whose lack of interpretability can be a significant barrier in a public policy context. This is crucial when decisions involve prioritizing some entities over others, where transparency is imperative to prevent perceived bias. The analysis highlights a key trade-off. The study found that while the CatBoost model provided the highest predictive accuracy (R² = 0.8457), its complex nature limits its direct interpretability. However, the analysis also revealed that simpler, more transparent models performed robustly. Notably, the K-Neighbors Regressor (knn) model, which offers moderate interpretability, achieved a strong R² score of 0.8106. This presents policymakers with a clear and powerful choice. For tasks requiring the most accurate identification of at-risk entities, CatBoost is the optimal tool. For tasks requiring explanation and policy simulation, the K-Neighbors Regressor is highly recommended, as it offers a robust predictive performance with a marginal trade-off in accuracy for a substantial gain in explainability. A detailed classification of all evaluated algorithms by their interpretability level can be found in Appendix D. It is recommended that policymakers begin with a PDCA continuous improvement or Deming cycle, identifying and validating mitigation actions in successive years by comparing local and regional security metrics.

Despite the fact that this study covers more than 7000 municipalities in Spain, its direct applicability to other countries is inherently limited. However, this characteristic is also its main strength: it serves as a first test model at the national level, demonstrating the viability of a data-based approach. The present work demonstrates a baseline and a modular methodology that has been designed for adaptation to other countries. It is important to note that the technological variables that capture the infrastructure exposed to the Internet and its vulnerabilities are global and would not require adaptation. However, the variables of competency or context (demographic, economic, geographical) require local adaptation. In order to achieve this objective, it is necessary to conduct a preliminary study. The purpose of this study is twofold: firstly, to identify reliable government sources, and secondly, to establish a mapping of variables between countries. To provide an illustrative example, in a federal state such as Germany, the variable province could be linked to the concept of Land. Conversely, in a centralized system like France, the relevance of province could be limited, with the predictive weight being attributed to other variables, such as the allocation of specific state aid to municipalities. This necessity for contextual adaptation is so precise that it is even evident within Spain itself. Some autonomous communities, such as the Basque Country and Navarre, maintain historical special economic quotas within their constitutional legal frameworks that exclude them from the general economic criteria applicable to the rest of the provinces. The correlation between IT investments and risk could not be analyzed in these two communities, which represent less than 6% of municipalities globally. In subsequent studies, it would be advisable to undertake a more comprehensive analysis of these two autonomous communities by aligning their public information sources.

The absence of a public registry of cyber incidents is a key limitation. This limitation was partially mitigated in this study by using more than 250 incidents reported by the media. This directly impacts the definition of the risk threshold. The following courses of action are proposed to overcome this situation:

At the National level: Leverage structures such as CCN-CERT in Spain (or its counterparts) to centralize incident reporting by local administrations in a standardized, anonymized manner that is accessible to the research sector;
At the European level: It is hereby proposed that ENISA assume the leadership role in the development of a unified platform and a standardized taxonomy of incidents, in accordance with the directives outlined in the NIS2 Directive. The establishment of a European-level risk observatory would facilitate the validation and refinement of predictive models such as ours on an unprecedented scale.

It is important to acknowledge that the CIORank metric is designed to address external perimeter vulnerabilities, including web-facing and digital service perimeter. Consequently, it does not directly measure high-impact vectors such as phishing campaigns or internal on-premise or cloud misconfigurations [33]. However, the security flaws identified by CIORank frequently function as initial entry points for more complex attacks. For instance, an unpatched vulnerability on a public web server or exposure of private ports (factors that lower the risk score) could serve as the initial entry point for an attacker to collect credentials, which are then used to breach internal systems or launch phishing attacks from a trusted domain. It is evident that the fundamental purpose of this framework is to facilitate the identification of critical weak links in the external perimeter. This, in turn, will enable the prioritization of mitigation measures.

The following are related to future work that could minimize the impact of the above limitations:

Increase the number of local public entities in nearby countries. This could establish a supranational geographical trend among similar countries;
Develop models that relate risk metric to expected economic loss if risks are not mitigated, using the FAIR framework [34];
Perform attack simulations based on the vulnerabilities detected in the external perimeter. Since this study conducts a strategic analysis at the country level, answering “what” and “why”, the use of techniques similar to Monte Carlo would allow tactical actions to be prioritized by defining “how”;
A perception study could be conducted among key C-level executives, such as Chief Operating Officers (COOs), Chief Technology Officers (CTOs), Chief Information Officers (CIOs), and Chief Information Security Officers (CISOs) in public administrations, in order to base the weighting in line with the country’s overall strategies.
Explore and correlate the study results with publicly available datasets on regional differences, if accessible. This could include national statistics on digital infrastructure (e.g., broadband penetration) or official reports from national/European agencies, which would help further contextualize the impact of institutional versus infrastructural factors;
The temporal dynamics of risk should be investigated by means of causal inference methodologies appropriate for mixed data types. Although standard Granger causality is constrained to numerical variables, future research could adapt advanced techniques to explore time-lagged relationships. Such analysis has the potential to yield valuable leading indicators of risk, for example, by determining whether certain technical changes systematically precede a change in an entity’s security posture.

5. Conclusions

The present study proposes a methodological approach to the assessment of external perimeter exposure, integrating technological and competence-related variables. The results indicate that, while technological factors remain relatively consistent across various contexts, competence-related variables, particularly geographical and organizational distribution, have the potential to substantially influence the external risk profile. With more than 250 incidents reported by the media, the importance of identifying weaknesses before an attack occurs is underscored. By prioritizing preventive measures based on this exposure assessment, organizations can optimize resource allocation and strengthen their cyber resilience.

The analysis enabled the empirical calculation of a risk threshold for the risk metric, providing a practical benchmark for classifying exposure levels. Findings also revealed no correlation between IT investment levels and associated external perimeter cyber risk, challenging the assumption that higher expenditure guarantees lower vulnerability. Among the competence-related variables, province emerged as a key non-technological factor in external risk determination, highlighting the relevance of contextual and organizational elements in cyber risk management.

Regression models demonstrated strong performance in estimating both past and future risk levels, offering policymakers robust, data-driven tools for strategic decision-making. The ML integration techniques ensure that not only is external risk quantified, but the underlying drivers are made transparent, thus enhancing the trust and usability of the results.

Future research should expand the geographical scope, incorporate structured incident datasets beyond media reports, and explore additional explainable AI approaches to further strengthen the reliability and generalizability of the proposed framework.

Author Contributions

Conceptualization, J.S.-Z. and J.S.-M.; methodology, J.S.-Z. and J.S.-M.; software, J.S.-Z.; investigation, J.S.-Z. and J.S.-M.; data curation, J.S.-Z.; writing—original draft preparation, J.S.-Z.; writing—review and editing, J.S.-M.; visualization, J.S.-Z. and J.S.-M.; supervision, J.S.-M.; project administration, J.S.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in Zenodo at https://doi.org/10.5281/zenodo.16914357.

Conflicts of Interest

Author Javier Sanchez-Zurdo was employed by the company Heroux-Devtek SPAIN. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AUC	Area Under the ROC Curve
CERT	Computer Emergency Response Team
CCN-CERT	Centro Criptológico Nacional
CVE	Common Vulnerability and Exposures
DDoS	Distributed Denial-of-Service
ENISA	European Union’s Network and Information Security Agency
GenAI	Generative AI
ICMA	International City/County Management Association
LLMs	Large Language Models
MAE	Mean Absolute Error
MFA	Multi-Factor Authentication
ML	Machine Learning
MSE	Mean Square Error
NIS2	Network and Information Systems directive 2
PII	Personally Identifiable Information
RMSE	Root Mean Square Error
SHAP	SHapley Additive Explanations
SEO	Search Engine Optimization

Appendix A. Data Acquisition Framework

Appendix A.1. Medallion Architecture for Data Processing

Medallion architecture has also been employed as a data design pattern, facilitating data capture, cleaning, cleansing, and preparation for use. The implementation of this pattern varies depending on the layer in which the data resides, with Bronze, Silver, and Gold representing the different layers, as illustrated in Figure A1. This layered data organization overcomes the limitations of traditional data lakes and warehouses, including technological heterogeneity, data replication, and complexity in data management [35].

Figure A1. Medallion architecture.

Appendix A.2. Data Sources and Variables

This research uses the main institutional websites of the Spanish state as reliable sources of information. The following sources and their references are indicated:

INE—National Statistics Institute [36];
Seguridad Social—Spanish Ministry of Social Security [37];
SEPE—Spanish Ministry of Employment [38];
AEAT—Tax Office [39];
MINHAP—Spanish Ministry of Treasury [40];
CNIG—National Center for Geographic Information [41];
Datos.gov.es—Ministry for Digital Transformation [42].

Related to the sources of information closest to technical metrics, these are their sources:

SSL Server Test [43];
Mozilla Observatory [44];
Google Safe Browsing [45];
Shodan Search Engine [46];
Network Time Foundation NTP [47];
MXToolBox for blacklists [48];
W3C Web Content Accessibility Guidelines [49].

The technological variables extracted from the previous data sources are the following:

S1—The remaining time for HTTPS/SSL/TLS certificates to expire (in days);
S2—The detection of obsolete cryptographic digests, such as SHA1, in encrypted communications;
S3—The number of open ports in the main domain, in addition to those associated with web traffic;
S4—Number of documents indexed by Google that contain entity metadata, such as user account locations;
S5—Evaluation of the SSL/TLS digital certificate, where A is indicative of excellent quality and F is indicative of insecure quality;
S6—The presence of a Robots.txt file serves to prevent indexing by search engines and impede the reconnaissance phase of the cyber kill chain;
SX_Safe_Browsing—The domain reputation databases are consulted to confirm if the entity’s domain is listed as unsafe;
SX_Shodan—A check is made to identify whether the entity’s domain has any known vulnerabilities that have not yet been conveniently patched. The vulnerabilities are identified with the CVE code or Common Vulnerabilities and Exposures. The Shodan tool has been used for this purpose, facilitating the cataloging and identification of each monitored entity;
A1—Average download speed of web content;
A2—The synchronization of the online service with the Network Time Protocol (NTP) time provider;
A3—The number of servers that provide the online service. This metric is associated with the resilience, fault tolerance and high availability of the published service;
A7—The number of instances in which the domain in question has been included on lists of malicious domains. To illustrate, there are commercial services such as MXToolBox;
SEO1—The optimization of the service when using a desktop or laptop device;
SEO2—The optimization of the service when using mobile devices, including tablets, smartphones, Chromebooks, and wearables;
SEO3_accessibility—The service has been optimized in accordance with the accessibility criteria set out in the WCAG Web Content Accessibility Guidelines;
SEO3_best_practices—The service was optimized in accordance with the standards of good web programming practices, which encompass HTML, CSS and JavaScript code;
SEO3_performance—This metric concerns the optimization of the service from the moment a request is made until the end user receives the complete content that has been requested. This metric is employed in order to ascertain the quality of the user experience;
SEO3_pwa—The service is to be optimized to become a progressive application. This type of application has the capability to be executed on any platform, operating system, or device that is web-based, which facilitates portability and simplifies the management of push notifications and operations that are performed offline;
SEO3_seo—The service has been optimized in order to improve its ranking in web search engines;
SEO4—The number of cookies that the online service requires the end user to accept. This metric gauges the user experience and perception of privacy in relation to the General Data Protection Regulation (GDPR);
SEO5_Bing—The number of results displayed by the Microsoft Bing search engine in response to a query about the public entity;
SEO6_Google—The number of results displayed by the Google search engine after a query about the public entity;
SEO7_links—The number of web links displayed on the main domain page of the monitored entity.

Related to the competency or context variables of the local public entity, a total of 70 variables are proposed, which have been defined and grouped into the following categories.

Total population—Total population and differentiated by gender (×3 variables);
Social security—Number of citizens working in a common or special scheme, sector category (×7 variables);
Municipal taxes—Taxes exclusive to the municipality (×41 variables);
Registered unemployment—Registered unemployment by sector and age bracket (×12 variables);
Land data—Surface, area, perimeter and GPS position of the entity (×6 variables);
Public finance data—Debt registered with the Treasury (×1 variable).

Related to the economic budgets managed by an entity, these are the variables collected:

Budget_cnpt_1: Chapter 1, personnel expenses;
Budget_cnpt_206: Chapter 2 of the current goods and services expenses includes “renting computer equipment, office automation applications, data transmission, operating systems, database management applications, and any other computer equipment and software”;
Budget_cnpt_216: Chapter 2 of the current goods and services expenses includes “Maintenance of web services, intranet, Internet, data network, voice network, antivirus software, document management, warranties, repairs of computer equipment, and maintenance of computer programs”;
Budget_cnpt_220.02: Chapter 2 of the current goods and services expenses includes “non-inventoriable computer equipment for the normal functioning of computer, office automation, transmission and other equipment such as diskettes, continuous paper, standard software packages”;
Budget_cnpt_222.03: Chapter 2 of the current goods and services expenses includes “telephone services and computer communications”;
Budget_cnpt_636: Chapter 6 of Real Investments includes “purchases of equipment for information processes”;
Budget_cnpt_641: Chapter 6 of Real Investments includes “purchases of computer applications”.

Appendix A.3. Risk Metric

Three technological indicators are available to facilitate a risk assessment of the external perimeter of a local public entity. The following indicators are to be considered:

Security: This indicator uses metrics such as the quality of digital certificates, the presence of common vulnerabilities and exposures (CVEs), and open ports. All of the previously mentioned technological variables beginning with SX are used in this indicator;
Availability: The availability indicator is composed of various technical metrics, including download speed, server time synchronization and the blacklisting of domains;
SEO: The SEO indicator includes several metrics such as optimization, compliance and accessibility of the monitored systems.

The weighting assigned to each variable within a specific indicator (Security, Availability or SEO) is proportional to the total number of metrics contained within that indicator. This weighting is based on the results of a survey of public sector online service users’ perceptions of cybersecurity [24].

Consequently, CIORank serves as a metric that comprehensively assesses the technical characteristics of an entity exposed to the Internet. This evaluation incorporates critical factors such as security, availability, and the adoption of best practices in the exposed services.

Appendix A.4. CIORank as a Standardized Penetration Testing Approach

Any additional metrics provided by other cybersecurity tools can be easily integrated into CIORank, provided they are assigned to one of the typical phases of a penetration test. For informational purposes, the metrics chosen have been aligned with the phases of the NIST SP 800-115 standard framework [50], “Information Security Testing and Assessment”:

Planning & Preparation: The target is the perimeter exposed to the Internet by entities. The scope is defined by the full list of local public entities provided by the National Institute of Statistics. A black-box testing approach is defined, in which the infrastructures of each entity are unknown a priori. The Rules of Engagement specify that only actions equivalent to those of a regular user are performed, ensuring no disruption to the monitored entity;
Information Gathering/Reconnaissance: Public information on each of the entities to be monitored is collected from government data sources. This information is supplemented with passive enumeration tools such as Whois, Google, Bing, and Shodan.
Vulnerability Analysis/Threat Modeling: Ports, services, software and hardware versions are identified, and potential attack vectors are found in relation to the identified technologies and known vulnerabilities (CVE).
Exploitation (Scanning & Enumeration): In this phase, the objective is to exploit the detected vulnerabilities to attempt code execution, privilege escalation, or lateral movement. It should be noted that none of these actions were executed during this research.
Post-Testing Activities: In this phase, the impact to which an entity is exposed is assessed, preparing reports and recommendations to maximize resilience and user experience in the face of cybersecurity events.

As illustrated in Table A1, the CIORank metrics are assigned according to the different phases of the NIST SP 800-115 framework. Some metrics could be assigned to multiple phases.

Table A1. Phases of the NIST SP 800-115 standard framework and related CIORank metrics.

Phase	Competency Metrics	Technological Metrics
Planning & Preparation	Government sources: INE	N/A
Information Gathering	Government sources: INE Seguridad Social SEPE AEAT MINHAP CNIG Datos.gov.es	S4—sensitive metadata indexed by Google S6—a presence of a Robots.txt file SX_Safe_Browsing—domain reputation SX_Shodan—Shodan’s known CVE SEO5_Bing—number of results in Bing SEO6_Google—number of results in Google SEO7_links—links on the home page
Vulnerability Analysis	N/A	S1—time SSL/TLS expiration S2—obsolete cryptographic algorithms S3—number of open ports S5—SSL/TLS quality assessment A2—synchronization with NTP A3—servers providing HA service A7—domain inclusion in blacklists SEO4—cookies required
Exploitation	N/A	S2—weak cipher confirmation S3—exploitation of exposed ports S5—SSL/TLS flaw validation SX_Shodan—testing for detected CVE
Post Testing Activities	N/A	A1—content download speed SEO1—optimization for desktop devices SEO2—optimization for mobile devices SEO3_accessibility—compliance with WCAG criteria SEO3_best_practices—compliance with good programming practices SEO3_performance—web application performance SEO3_pwa—support for progressive applications SEO3_seo—SEO optimization of the service

Appendix B. Machine Learning Model Flow

In any Artificial Intelligence project that utilizes Machine Learning (ML) techniques, it is imperative to follow standard steps. As illustrated in Figure A2, eight phases are identified, from the capture of essential data to the deployment of the model in production.

Figure A2. Machine Learning model flow.

In this study, these phases have been used to collect the required data and to create various ML models that help to estimate the global risk in the external perimeter of a local public entity.

Appendix B.1. Phase 1 Data Capture

From the years 2022 to 2024, an extensive collection of data was obtained from local public entities. Several robots use Python 3.9 and shell script technology Bash 5.1 to collect specific metrics, allowing a single collection technique for any local public entity.

Robots have the capacity to collect both technological metrics and competency metrics. The first category is concerned with the architecture, services, or features available on the Internet by the entities in question. Examples of technological metrics include response times, digital certificates and security headers.

Competency or Contextual metrics are defined as those that characterize a local public entity. Automated robots have been utilized to collect this type of data from government and public sources. Examples of competitive metrics include the number of inhabitants, population distribution, taxation, unemployment, and geographic location.

For more information, see Appendix A.1 and Appendix A.2.

Appendix B.2. Phase 2 Data Collection

The collection of technological or competency metrics is facilitated by robots, which capture all the raw information and store it in files for future use. It should be noted that this data is gathered at a particular point in time for a specific local public entity and subsequently stored in a file. This feature enables users to observe the evolution of each entity simply by navigating its data over time. Within the Medallion architecture, this phase is identified in the Bronze layer.

Appendix B.3. Phase 3 Data Cleaning

The preceding stage provides unprocessed RAW files. These data were collected in the years 2022 and 2023, serving as the primary foundation for training machine learning models. The year 2024 has been reserved for the final verification of the models.

The numerical indicator referred to as CIORank has been utilized as the target variable. In the event that the value of the indicator is found to be less than 60%, it will be assumed to be at risk. In the event that the entity’s value is determined to be equal to or greater than 60%, it will be deemed to be not at risk. The 60% threshold was selected based on the average of all the values obtained in 2022 and 2023 for entities that were reported to have suffered an attack. The selection and limitations of this design decision will be evaluated in the Discussion section.

The source data are normalized to avoid biases resulting from variables that have larger orders of magnitude. Outliners that have been determined to be of negligible value to the model have undergone elimination, and categorical variables have undergone coding.

According to the Medallion architecture, this phase is designated as the Silver layer. Within this layer, critical information is extracted, data is cleaned and sorted, and formats are applied to facilitate its later use. Consequently, well-structured JSON files are obtained, exhibiting a clearly defined logical structure and pre-filtered, refined data, rendering them readily usable.

Appendix B.4. Phase 4 Feature Selection

In this phase, the existing JSON files are consolidated to generate a unified data structure, and the relevant information is exported to select the features that can have the greatest impact on the Machine Learning model. The most significant variables for risk management must be identified.

It is essential to know whether IT investments have improved security within the external perimeter. To find out if there is a relationship between security perception and investments, seven variables from the entity’s annual budget have been selected [28,29]. The economic budget variables can be found in the Appendix A.2. The risk metrics also can be found in the Appendix A.3.

Taking into consideration the seven economic variables and the four technological risk variables, the calculation of a correlation between the variables is a possibility. The present study aims to establish a direct correlation between economic investments and the potential for risk mitigation within the institution. The findings of this feature selection in the Machine Learning models will be presented in the results section.

The objective of this phase is to identify the most representative variables for managing risk, discarding those that do not provide relevant information to the model and linking it to the Gold layer of the Medallion architecture. This data layer will subsequently be utilized by the machine learning models.

Appendix B.5. Phase 5 Model Selection & Training

In this phase, all data from the years 2022 and 2023 will be utilized for training and testing purposes. K-fold cross-validation is used to ensure robust performance estimates across multiple splits (with k = 20). The 2024 data were designated as an out-of-time validation set to assess the model’s generalization capability on entirely unseen data.

For the qualitative predictor model, the objective is to obtain a binary classification of whether an entity is at risk or not. To this end, 15 different ML algorithms have been selected using the Scikit-learn library [30]. The models are:

CatBoost Classifier (catboost) [26,27];
Light Gradient Boosting Machine (lightgbm) [51];
Extra Trees Classifier (et) [52];
Random Forest Classifier (rf) [53];
Gradient Boosting Classifier (gbc) [54];
Logistic Regression (lr) [55];
Ridge Classifier (ridge) [56];
Linear Discriminant Analysis (lda) [57];
K Neighbors Classifier (knn) [58];
Ada Boost Classifier (ada) [59];
SVM—Linear Kernel (svm) [60];
Decision Tree Classifier (dt) [61];
Quadratic Discriminant Analysis (qda) [62];
Naive Bayes (nb) [63];
Dummy Classifier (dummy).

For the quantitative predictor model, the objective is to obtain a regression indicating the risk value. To this end, 19 different algorithms have been selected. The models are:

CatBoost Regressor (catboost) [26,27];
Light Gradient Boosting Machine (lightgbm) [51];
Random Forest Regressor (rf) [53];
Gradient Boosting Regressor (gbr) [54];
Extra Trees Regressor (et) [52];
K Neighbors Regressor (knn) [58];
Least Angle Regression (lar) [64];
Bayesian Ridge (br) [65];
Ridge Regression (ridge) [56];
Linear Regression (lr) [66];
Huber Regressor (huber) [67];
Ada Boost Regressor (ada) [59];
Decision Tree Regressor (dt) [52];
Passive Aggressive Regressor (par) [68];
Orthogonal Matching Pursuit (omp) [69];
Elastic Net (en) [70];
Lasso Regression (lasso) [71];
Lasso Least Angle Regression (llar) [64].

The final outcome of this phase is the selection of the most suitable algorithm for each desired predictor model and its corresponding performance evaluation metrics.

Appendix B.6. Phase 6 Model Evaluation

Each predictive model has performance metrics that allow the goodness of each algorithm to be evaluated. The models with the best performance metrics are the candidates for implementation.

In the specific case of the qualitative predictive model, the performance metrics are as follows:

Accuracy: It is the proportion of correct predictions relative to the total number of observations evaluated. In instances where the dataset is imbalanced, this metric may appear to be highly effective, despite the model’s potential inadequacies;
AUC: The Area Under the ROC (Receiver Operating Characteristic) Curve measures how well the model discriminates. It shows the true positive rate (recall) and false positive rate (FPR) at different thresholds;
Recall: This is the Sensitivity or True Positive Rate. It is a key metric for determining the best models when avoiding false positives has significant cost;
Precision: It is the proportion of cases classified as positive that actually belong to the positive class. Models with high precision indicate that they have low false positives;
F1-Score: It is the harmonic mean between Recall and Precision This metric is particularly useful when the objective is to select a model that is balanced between the two metrics from which it is composed. The utility of this function is particularly pronounced in scenarios involving imbalanced data.

In accordance with these five metrics, priority will be given to models with the highest AUC in order to select those which most effectively discriminate between positive and negative classes across a range of thresholds and regardless of possible imbalance between classes.

For the specific case of the quantitative predictor model, the performance metrics are different:

MAE—Mean Absolute Error: It measures the average of the absolute differences between predicted and actual values. It does not penalize large and small errors differentially;
MSE—Mean Square Error: It is a widely used metric that calculates the average of squared errors. It penalizes mostly large errors and outliers but is difficult to interpret because of the squared units;
RMSE—Root Mean Square Error: It is a metric that combines the same units as MAE for ease of interpretation and is sensitive to large errors;
R²—Coefficient of determination: This metric measures what proportion of the variability in the dependent variable is explained by the model. The closer this coefficient is to 1, the better the model.

Within these four metrics, the models with the lowest possible MAE will be prioritized, although the results of all the above metrics will be displayed. MAE enables direct interpretation in the same units as the target variable, and it is robust to moderate outliers. This avoids the over-penalization of extreme individual errors that could distort the overall assessment of the model.

Due to their vast number and heterogeneity, local public In addition to these performance metrics, it is essential to understand the factors that drive the model’s predictions. This allows us to move beyond simple correlations and identify the complex, non-linear dynamics of cyber risk. To achieve this objective, a SHAP (SHapley Additive Explanations) diagram has been employed to illustrate the influence of each variable on the predictions. In order to facilitate the interpretation of a SHAP plot, the following elements should be taken into account:

Y axis—The vertical axis contains all the variables of the model ordered by importance, with the most important variables being those located at the top;
X axis—The horizontal axis represents the SHAP values, showing the importance of the impact of each variable in the model. Any positive value indicates that the characteristic increases the probability of the predicted class. A negative value decreases that probability;
Colors—Each dot seen in the graph represents a row of data in the dataset. Red colors imply high values for the variable and blue colors are low values for the variable;
Dot scatter—The dots are distributed across each variable horizontally showing the effects.

Appendix B.7. Phase 7 Optimization & Fine Tuning

Subsequent to the selection of the machine learning model, a phase is initiated in which the model is subjected to optimization. The objective is to adjust hyperparameters, minimize error, enhance generalization, and guarantee that the model is not overfitted nor underfitted.

The hyperparameters were assessed through a series of combinations involving 10 folds and 10 candidates, thereby ensuring a total of 100 fits. In certain instances, the implementation of these combinations did not enhance the performance of the original model. Consequently, the algorithm recommended the utilization of the original model. This phenomenon is considered a standard aspect of automated model selection systems.

Appendix B.8. Phase 8 Deploying the Model

Once the model has been optimized, it becomes available to external entities via web services or APIs. The result predicted by the model is obtained by entering values in the key variables. The data collected from 2024, which were not included in the previous process, will be utilized to conduct a double-check of the goodness of fit of the implemented models.

Appendix C. Risk Threshold Sensitivity Analysis

Appendix C.1. Sensitivity in the Subcomponents of the Risk Metric

This section details the sensitivity analysis performed to assess the robustness of the study’s conclusions against variations in the weighting of the subcomponents of the CIORank metric (Security, Availability, and SEO).

The primary objective is to confirm that the key findings are not an artifact of the initial equal weighting process, but rather reflect stable relationships inherent within the data. To this end, two key indicators were evaluated in each alternative weighting scenario:

The overall predictive performance of the model, measured by its coefficient of determination (R²);
The stability of the predictor hierarchy, verifying whether the province variable maintained its predominance as a key risk factor.

Table A2 presents both the proposed scenarios and the weights of each of the components.

Table A2. Weighting Scenarios for the CIORank Subcomponent Sensitivity Analysis.

Scenario	Strategic	Security	Availability	SEO
A	Baseline. Balanced	33%	33%	33%
B	Focus on Security (CISO Vision)	50%	25%	25%
C	Focus on User Experience (COO Vision)	25%	50%	25%
D	Focus on Technical Quality (CTO Vision)	25%	25%	50%

Table A3 presents the results obtained for the CatBoost algorithm with the different strategies established.

Table A3. Evaluating Model Robustness Across Weighting Scenarios.

Scenario	Strategic	R²	Province Position
A	Baseline. Balanced	0.8457	4º
B	Focus on Security (CISO Vision)	0.8508	7º
C	Focus on User Experience (COO Vision)	0.8536	6º
D	Focus on Technical Quality (CTO Vision)	0.8438	6º

The results presented in Table A3 provide significant evidence that validates the study’s primary conclusions. The primary and most substantial finding in the sensibility study is the remarkable stability of the model’s predictive performance across all the tested scenarios. The baseline R² is 0.8457, and in the three alternative scenarios, the R² score shows minimal variation, ranging from a low of 0.8438 to a high of 0.8536. This fluctuation is minimal, with a margin of error of less than 1%. The findings suggest that, regardless of the strategic priority (whether it be security, user experience or technical quality and SEO), the model consistently explains approximately 85% of the risk variability. This provides strong evidence that the model’s high predictive capability is inherent to its characteristics and not contingent on a particular weighting of the CIORank components.

The variable province continues to be a predominant predictor in all scenarios. In the baseline scenario, province is the fourth most important predictor. In the alternative scenarios, its ranking fluctuates slightly to sixth or seventh place. However, it is essential that it remains consistently among the ten most important variables. It does not disappear or lose relevance. This confirms that geographical location, as an indicator of common governance structures and shared services, is a fundamental and stable risk factor.

Appendix C.2. Risk Threshold Sensitivity Analysis

A sensitivity analysis was conducted to validate the robustness of the empirically established risk threshold of CIORank < 60%. This analysis aims to determine whether the classification model’s performance and its primary conclusions remain stable under variations of this threshold.

The methodology involved retraining and evaluating the full suite of classification algorithms under two alternative scenarios:

Conservative Threshold (55%): A stricter threshold to identify a larger number of potentially at-risk entities;
Liberal Threshold (65%): A more lenient threshold, focused on identifying the most evident high-risk cases with greater certainty.

The results from these scenarios were then compared against those obtained from the original 60% threshold.

Table A4. Performance comparison of the CatBoost model at different risk thresholds.

Threshold	Accuracy	AUC	Recall	Precision	F1
55%	0.9225	0.9671	0.9225	0.9214	0.9213
60%	0.8475	0.9303	0.8475	0.8480	0.8474
65%	0.8772	0.9253	0.8772	0.8730	0.8722

The sensitivity analysis reveals a key interplay between the risk threshold and the model’s performance metrics. On one hand, the Recall metric exhibits the expected behavior: it is highest at the lowest threshold (0.9225 at 55%), as a stricter definition of risk compels the model to be more sensitive, thereby minimizing the omission of positive cases (False Negatives).

On the other hand, Precision displays a non-linear dynamic, also peaking at the 55% threshold (0.9214). This finding suggests that while the model is highly precise when identifying the most vulnerable entities, its precision decreases when attempting to classify a broader spectrum of risk (up to 65%). This decline is attributed to a disproportionate increase in False Positives when including a subpopulation of “ambiguous” entities at the decision boundary. Taken together, these results demonstrate that the 60% threshold represents the most pragmatic and robust trade-off. It offers a far superior Recall than the 65% threshold without the potential overhead of false alarms from a 55% threshold, while maintaining an exceptionally high Precision (0.8480), thus validating its suitability for strategic risk management.

Therefore, the 60% threshold represents the most appropriate balance between sensitivity and specificity, which is also aligned with the empirical evidence from real-world reported attacks.

Appendix D. Interpretability of ML Algorithms

Machine learning algorithms exhibit a wide spectrum of interpretability, ranging from transparent ‘white-box’ models to complex ‘black-box’. In decision-making contexts, simpler and easier-to-explain models are generally favored, even if this results in a slight reduction in predictive performance.

Appendix D.1. Maximum Interpretability

Linear/Logistic Regression (lr, ridge, lasso, lar): Its operation is based on a simple mathematical equation. Each feature has a coefficient (a weight) that precisely indicates how much it influences the final outcome, and in which direction (positive or negative). This is the gold standard of causal explanation for a policymaker;
Decision Tree (dt): A single decision tree is a flowchart of “if/then” rules. The system is highly intuitive, allowing users to visualize and follow the path to the conclusion of any given municipality (depending on the depth of the tree).

Appendix D.2. Moderate Interpretability

K-Nearest Neighbors (knn): These types of algorithms are designed to predict risk by evaluating the k most similar municipalities based on training data. A municipality could be at risk because its closest neighbors are also at risk. This does not provide a global rule for interpretation, but rather offers a very clear local explanation.
Linear Discriminant Analysis (lda) & SVM with a Linear Kernel (svm): There are two classes: ‘at risk’ versus ‘protected’. This approach identifies the optimal line (or hyperplane) that differentiates between these two classes (geometric logic). Due to its linear nature, the concept is relatively straightforward to understand. Concepts such as maximizing the distance between classes can increase abstraction and understanding compared to a simple regression equation.

Appendix D.3. Low Interpretability

Random Forest (rf) & Extra Trees (et): These algorithms are constructed with hundreds or thousands of decision trees, not just one. Each tree has one vote, and the final decision is made by majority vote. It is challenging to follow a single logical path through all the trees, although it is possible to determine which characteristics are, in general, the most important for the forest.

Appendix D.4. Minimal Interpretability: Advanced Boosting Models

AdaBoost (ada), Gradient Boosting (gbc), LightGBM (lightgbm), and CatBoost (catboost): CatBoost and other boosting models are at this level, building many trees (like Random Forest) but sequentially. Each new tree attempts to correct the errors of the previous one using complex mathematics, such as gradient descent. It is almost impossible for a human to trace the logic of a single prediction, which is why external tools such as SHAP are created to try to explain their internal workings and the impact of characteristics on their results.

Table A5. Classification of Algorithms by Interpretability Level.

Interpretability	Algorithms
Maximum	lr, ridge, lasso, lar, dt
Moderate	knn, lda, svm
Low	rf, et
Minimal	ada, gbc, lightgbm, catboost

According to this classification, selecting CatBoost ensures optimal outcomes while maintaining minimal interpretability. It is only possible to identify the key characteristics of the model by using tools such as SHAP, as has been explained in this section.

References

Nurmi, J.; Niemelä, M.; Brumley, B.B. Malware Finances and Operations: A Data-Driven Study of the Value Chain for Infections and Compromised Access. In Proceedings of the 18th International Conference on Availability, Reliability and Security, Benevento, Italy, 29 August–1 September 2023; pp. 1–12. [Google Scholar]
Zoltan, M. Dark Web Price Index. 2023. Available online: https://www.privacyaffairs.com/dark-web-price-index-2023/ (accessed on 22 March 2025).
Cherian, S. Healthcare Data: The Perfect Storm. Available online: https://www.forbes.com/councils/forbestechcouncil/2022/01/14/healthcare-data-the-perfect-storm/ (accessed on 22 March 2025).
Brundage, M.; Avin, S.; Clark, J.; Toner, H.; Eckersley, P.; Garfinkel, B.; Dafoe, A.; Scharre, P.; Zeitzoff, T.; Filar, B.; et al. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. arXiv 2018, arXiv:1802.07228. [Google Scholar] [CrossRef]
Mirsky, Y.; Demontis, A.; Kotak, J.; Shankar, R.; Gelei, D.; Yang, L.; Zhang, X.; Lee, W.; Elovici, Y.; Biggio, B. The Threat of Offensive AI to Organizations. Comput. Secur. 2023, 124, 103006. [Google Scholar] [CrossRef]
Papadopoulos, P.; Katsikas, S.; Pitropakis, N. Editorial: Cybersecurity and Artificial Intelligence: Advances, Challenges, Opportunities, Threats. Front. Big Data 2025, 7, 1537878. [Google Scholar] [CrossRef] [PubMed]
Hossain, S.T.; Yigitcanlar, T.; Nguyen, K.; Xu, Y. Local Government Cybersecurity Landscape: A Systematic Review and Conceptual Framework. Appl. Sci. 2024, 14, 5501. [Google Scholar] [CrossRef]
CrowdStrike Crowstrike Global Threat Report. 2024. Available online: https://www.crowdstrike.com/en-us/resources/reports/crowdstrike-2024-global-threat-report/ (accessed on 20 May 2024).
Verizon Verizon Data Breach Investigations Report. 2024. Available online: https://www.verizon.com/business/resources/Te3/reports/2024-dbir-data-breach-investigations-report.pdf (accessed on 18 March 2025).
Al-Hawawreh, M.; Aljuhani, A.; Jararweh, Y. Chatgpt for Cybersecurity: Practical Applications, Challenges, and Future Directions. Clust. Comput. 2023, 26, 3421–3436. [Google Scholar] [CrossRef]
Alawida, M.; Abu Shawar, B.; Abiodun, O.I.; Mehmood, A.; Omolara, A.E.; Al Hwaitat, A.K. Unveiling the Dark Side of ChatGPT: Exploring Cyberattacks and Enhancing User Awareness. Information 2024, 15, 27. [Google Scholar] [CrossRef]
European Union Agency for Cybersecurity. ENISA Threat Landscape 2024: July 2023 to June 2024; Publications Office: Luxembourg, 2024; ISBN 978-92-9204-675-0. [Google Scholar]
Perez, E. Un Ciberataque Paraliza el Ayuntamiento de Sevilla: Piden un Rescate de Cinco MILLONES de euros para Recuperarlo. Available online: https://www.xataka.com/seguridad/ciberataque-paraliza-ayuntamiento-sevilla-piden-rescate-cinco-millones-euros-para-recuperarlo (accessed on 20 May 2024).
Hoffman, C. Washington County Pays $350,000 Ransom After Cyberattack. Available online: https://www.cbsnews.com/pittsburgh/news/washington-county-pays-ransom-cyberattack/ (accessed on 20 May 2024).
Longo, A. Westpole-PA Digitale, il vero Conto del Disastro: Enorme. Available online: https://www.cybersecurity360.it/nuove-minacce/westpole-pa-digitale-il-vero-conto-del-disastro-enorme/ (accessed on 20 May 2024).
Paganini, P. The Ransomware Attack on Westpole Is Disrupting Digital Services for Italian Public Administration. Available online: https://securityaffairs.com/156090/cyber-crime/westpole-ransomware-attack.html (accessed on 20 May 2024).
Norris, D.F.; Mateczun, L.; Forno, R. Cybersecurity and Local Government; John Wiley & Sons: Hoboken, NJ, USA, 2022; ISBN 978-1-119-78831-7. [Google Scholar]
ICMA Icma.Org. Available online: https://icma.org/ (accessed on 20 May 2024).
Chourabi, H.; Nam, T.; Walker, S.; Gil-Garcia, J.R.; Mellouli, S.; Nahon, K.; Pardo, T.A.; Scholl, H.J. Understanding Smart Cities: An Integrative Framework. In Proceedings of the 2012 45th Hawaii International Conference on System Sciences (HICSS), Maui, HI, USA, 4–7 January 2012; pp. 2289–2297. [Google Scholar]
Norris, D. A Look at Local Government Cybersecurity in 2020|Icma.Org. Available online: https://icma.org/articles/pm-magazine/look-local-government-cybersecurity-2020 (accessed on 20 May 2024).
European Parliament 2019/881 EU Regulation 2019/881 on ENISA and on Information and Communications Technology Cybersecurity Certification. Available online: http://data.europa.eu/eli/reg/2019/881/oj (accessed on 19 May 2024).
European Commission The EU Cybersecurity Act. Available online: https://digital-strategy.ec.europa.eu/en/policies/cybersecurity-act (accessed on 26 May 2024).
European Parliament 2022/2555 EU Directive 2022/2555 on Measures for a High Common Level of Cybersecurity Across the Union. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32022L2555 (accessed on 19 May 2024).
Sanchez-Zurdo, J.; San-Martín, J. A Country Risk Assessment from the Perspective of Cybersecurity in Local Entities. Appl. Sci. 2024, 14, 12036. [Google Scholar] [CrossRef]
Kesan, J.P.; Zhang, L. An Empirical Investigation of the Relationship between Local Government Budgets, IT Expenditures and Cyber Losses. IEEE Trans. Emerg. Top. Comput. 2021, 9, 582–596. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. arXiv 2017. arXiv:1706.09516. [Google Scholar]
Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for Big Data: An Interdisciplinary Review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef]
Baral, A.; Reynolds, T.; Susskind, L.; Weitzner, D.J.; Wu, A. Municipal Cyber Risk Modeling Using Cryptographic Computing to Inform Cyber Policymaking. arXiv 2024, arXiv:2402.01007. [Google Scholar] [CrossRef]
Yigitcanlar, T.; Senadheera, S.; Marasinghe, R.; Bibri, S.E.; Sanchez, T.; Cugurullo, F.; Sieber, R. Artificial Intelligence and the Local Government: A Five-Decade Scientometric Analysis on the Evolution, State-of-the-Art, and Emerging Trends. Cities 2024, 152, 105151. [Google Scholar] [CrossRef]
Jha, R.K.; Jha, M. Optimizing E-Government Cybersecurity through Artificial Intelligence Integration. J. Trends Comput. Sci. Smart Technol. 2024, 6, 67–87. [Google Scholar] [CrossRef]
Criado, J.I.; O.de Zarate-Alcarazo, L. Technological Frames, CIOs, and Artificial Intelligence in Public Administration: A Socio-Cognitive Exploratory Study in Spanish Local Governments. Gov. Inf. Q. 2022, 39, 101688. [Google Scholar] [CrossRef]
European Parliament 2024/1689. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 Laying down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act); European Union: Maastricht, The Netherlands, 2024. [Google Scholar]
Dong, F.; Wang, L.; Nie, X.; Shao, F.; Wang, H.; Li, D.; Luo, X.; Xiao, X. DISTDET: A Cost-Effective Distributed Cyber Threat Detection System. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023. [Google Scholar]
FAIR Institute. Available online: https://www.fairinstitute.org/ (accessed on 11 August 2025).
Schneider, J.; Gröger, C.; Lutsch, A.; Schwarz, H.; Mitschang, B. The Lakehouse: State of the Art on Concepts and Technologies. SN Comput. Sci. 2024, 5, 449. [Google Scholar] [CrossRef]
INE INE—National Statistics Institute. Available online: https://www.ine.es/ (accessed on 26 May 2024).
Spanish Ministry of Social Security Seguridad Social—Spanish Ministry of Social Security. Available online: https://www.seg-social.es/wps/portal/wss/internet/Inicio (accessed on 26 May 2024).
Spanish Ministry of Employment SEPE—Servicio Público de Empleo Estatal—State Public Employment Service. Available online: https://www.sepe.es/HomeSepe (accessed on 26 May 2024).
AEAT AEAT—Tax Office. Available online: https://sede.agenciatributaria.gob.es/ (accessed on 26 May 2024).
MINHAP Hacienda—Contabilidad Pública y Control. Available online: https://www.hacienda.gob.es/es-ES/Paginas/Home.aspx (accessed on 26 May 2024).
CNIG CNIG—Centro Nacional de Información Geográfica. Available online: http://www.ign.es/web/ign/portal/qsm-cnig (accessed on 26 May 2024).
Ministry for Digital Transformation Datos.Gob.Es. Available online: https://datos.gob.es/es/ (accessed on 26 May 2024).
Qualys, S.L. SSL Server Test. Available online: https://www.ssllabs.com/ssltest/ (accessed on 26 May 2024).
Mozilla Mozilla Observatory. Available online: https://observatory.mozilla.org/ (accessed on 26 May 2024).
Google. Google Safe Browsing; Google: Mountain View, CA, USA, 2024. [Google Scholar]
Shodan Search Engine for the Internet of Everything. Available online: https://www.shodan.io/ (accessed on 26 May 2024).
Network Time Foundation NTP Pool Project. Available online: https://www.ntppool.org/en/ (accessed on 26 May 2024).
MXToolBox, Inc. MXToolbox Supertool Blacklists. Available online: https://mxtoolbox.com/blacklists.aspx (accessed on 26 May 2024).
W3C Web Content Accessibility Guidelines (WCAG) 2.1. Available online: https://www.w3.org/TR/WCAG21/ (accessed on 26 May 2024).
Scarfone, K.A.; Souppaya, M.P.; Cody, A.; Orebaugh, A.D. Technical Guide to Information Security Testing and Assessment. NIST SP 800-115; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2008; p. NIST SP 800-115. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Breiman, L.; Friedman, J.; Olshen, R.A.; Stone, C.J. Classification And Regression Trees, 1st ed.; Chapman and Hall: New York, NY, USA; CRC: New York, NY, USA, 1984; ISBN 978-1-315-13947-0. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Statist. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Cox, D.R. The Regression Analysis of Binary Sequences. J. R. Stat. Soc. B Stat. Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Fisher, R.A. The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. IEEE Trans. Inform. Theory. 1967, 13, 21–27. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boost-ing. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Quinlan, J.R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Nembhard, H.B. Statistical Process Adjustment Methods for Quality Control. J. Am. Stat. Assoc. 2004, 99, 567–568. [Google Scholar] [CrossRef]
Maron, M.E. Automatic Indexing: An Experimental Inquiry. J. ACM 1961, 8, 404–417. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least Angle Regression. Ann. Statist. 2004, 32, 407–451. [Google Scholar] [CrossRef]
Tipping, M.E. Sparse Bayesian Learning and the Relevance Vector Machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar] [CrossRef]
Galton, F. Regression Towards Mediocrity in Hereditary Stature. J. Anthropol. Inst. Great Br. Irel. 1886, 15, 246–263. [Google Scholar] [CrossRef]
Huber, P.J. Robust Estimation of a Location Parameter. Ann. Math. Statist. 1964, 35, 73–101. [Google Scholar] [CrossRef]
Crammer, K.; Dekel, O.; Keshet, J.; Shalev-Shwartz, S.; Singer, Y. Online Passive-Aggressive Algorithms. J. Mach. Learn. Res. 2006, 7, 551–585. [Google Scholar]
Pati, Y.C.; Rezaiifar, R.; Krishnaprasad, P.S. Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In Proceedings of the 27th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1–3 November 1993. [Google Scholar]
Zou, H.; Hastie, T. Regularization and Variable Selection Via the Elastic Net. J. R. Stat. Soc. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]

Figure 1. Tasks and expected outcomes.

Figure 2. Qualitative model—ROC curve for CatBoost classifier.

Figure 3. Qualitative model—Importance of model variables for CatBoost classifier.

Figure 4. Qualitative model—SHAP summary plot for CatBoost Classifier.

Figure 5. Quantitative model—Residuals for CatBoostRegressor Model.

Figure 6. Quantitative model—Prediction Error for CatBoostRegressor Model.

Table 1. Attacks received by public entities and average CIORank indicator.

Year	n	CIORank Average
2022	125	57.3%
2023	129	56.8%

Table 2. Spearman test: Correlation between budget and risk variables.

Budget Concept	Security	Availability	SEO	CIORank
1	−0.0363	0.0827	0.0336	0.0459
206	0.0123	0.0525	−0.0069	0.0266
216	−0.0736	0.0173	−0.0136	−0.031
220.02	−0.0701	0.0147	0.0058	−0.022
222.03	−0.0494	0.0841	0.0093	0.0272
636	−0.0087	0.029	−0.0115	−0.0135
641	−0.0061	0.0012	−0.0025	−0.0091

Table 3. Testing ML models for qualitative classification. Supervised classification.

Model	Accuracy	AUC	Recall	Precision	F1
catboost	0.848	0.940	0.848	0.848	0.847
lightgbm	0.845	0.929	0.845	0.845	0.844
et	0.842	0.919	0.842	0.842	0.842
rf	0.834	0.914	0.834	0.834	0.834
gbc	0.832	0.920	0.832	0.833	0.832
lr	0.821	0.903	0.821	0.822	0.821
ridge	0.819	0.901	0.819	0.820	0.818
lda	0.819	0.901	0.819	0.820	0.818
knn	0.819	0.895	0.819	0.819	0.819
ada	0.816	0.903	0.816	0.817	0.816
svm	0.807	0.903	0.807	0.816	0.806
dt	0.774	0.774	0.774	0.775	0.774
qda	0.604	0.730	0.604	0.655	0.561
dummy	0.506	0.500	0.506	0.256	0.340
nb	0.496	0.804	0.496	0.516	0.351

Table 4. Testing ML models for quantitative regression.

Model	MAE	MSE	RMSE	R²
catboost	2.4319	10.2184	3.1947	0.8457
lightgbm	2.4780	10.5687	3.2491	0.8403
rf	2.5249	11.2499	3.3521	0.8301
gbr	2.6563	11.9368	3.4539	0.8197
et	2.6058	12.1019	3.4770	0.8173
knn	2.6554	12.5221	3.5366	0.8106
lar	3.2710	16.9024	4.1100	0.7453
br	3.2709	16.9024	4.1100	0.7453
ridge	3.2709	16.9023	4.1100	0.7453
lr	3.2710	16.9024	4.1100	0.7453
huber	3.2476	17.1601	4.1407	0.7417
ada	3.4064	17.9231	4.2325	0.7293
dt	3.2027	19.3444	4.3949	0.7080
par	3.7511	22.8557	4.7635	0.6526
omp	5.5543	49.3850	7.0232	0.2581
en	5.6148	56.1674	7.4853	0.1603
lasso	5.8728	60.7659	7.7857	0.0915
llar	2.4780	10.5687	3.2491	0.8403

Table 5. Post-deployment results.

Year	MAE	MSE	RMSE	R²
2022 and 2023	2.43	10.22	3.19	0.8457
2024	2.62	10.57	3.25	0.8403

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sanchez-Zurdo, J.; San-Martín, J. Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities. Electronics 2025, 14, 3845. https://doi.org/10.3390/electronics14193845

AMA Style

Sanchez-Zurdo J, San-Martín J. Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities. Electronics. 2025; 14(19):3845. https://doi.org/10.3390/electronics14193845

Chicago/Turabian Style

Sanchez-Zurdo, Javier, and Jose San-Martín. 2025. "Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities" Electronics 14, no. 19: 3845. https://doi.org/10.3390/electronics14193845

APA Style

Sanchez-Zurdo, J., & San-Martín, J. (2025). Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities. Electronics, 14(19), 3845. https://doi.org/10.3390/electronics14193845

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Beyond Geography and Budget: Machine Learning for Calculating Cyber Risk in the External Perimeter of Local Public Entities

Abstract

1. Introduction

1.1. Related Work

1.1.1. Cyber Attack Figures

1.1.2. Cyber Threats

1.1.3. Local Public Environment

1.1.4. Cybersecurity Framework

1.1.5. Massive Data in Local Public Entities

1.2. Summary of Literature and Research Objectives

2. Materials and Methods

2.1. Data Acquisition and Processing Framework

2.1.1. Data Sources

2.1.2. Definition of the Risk Metric: CIORank

2.1.3. Empirical Threshold of Risk

2.2. Modeling with Machine Learning

2.2.1. Dataset Preparation

2.2.2. Model Training and Selection

2.2.3. Variable Importance Analysis

3. Results

3.1. Correlation Between Risk and IT Investment

3.2. Qualitative Supervised Learning Model Results

3.3. Analysis of Variables in the Qualitative Model

3.4. Optimized Quantitative Supervised Learning Model Results

3.5. Goodness of the Quantitative Model in the 2024 Data

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Data Acquisition Framework

Appendix A.1. Medallion Architecture for Data Processing

Appendix A.2. Data Sources and Variables

Appendix A.3. Risk Metric

Appendix A.4. CIORank as a Standardized Penetration Testing Approach

Appendix B. Machine Learning Model Flow

Appendix B.1. Phase 1 Data Capture

Appendix B.2. Phase 2 Data Collection

Appendix B.3. Phase 3 Data Cleaning

Appendix B.4. Phase 4 Feature Selection

Appendix B.5. Phase 5 Model Selection & Training

Appendix B.6. Phase 6 Model Evaluation

Appendix B.7. Phase 7 Optimization & Fine Tuning

Appendix B.8. Phase 8 Deploying the Model

Appendix C. Risk Threshold Sensitivity Analysis

Appendix C.1. Sensitivity in the Subcomponents of the Risk Metric

Appendix C.2. Risk Threshold Sensitivity Analysis

Appendix D. Interpretability of ML Algorithms

Appendix D.1. Maximum Interpretability

Appendix D.2. Moderate Interpretability

Appendix D.3. Low Interpretability

Appendix D.4. Minimal Interpretability: Advanced Boosting Models

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI