Fiscal Management and Artificial Intelligence as Strategies to Combat Corruption in Colombia

Ana E. Monsalvo; Carlos M. Zuluaga-Pardo; Jaime A. Restrepo-Carmona; Lilibeth Aguilera-Pua; Juan C. Castaño; Edison F. Borda; Rosse M. Villamil; Hernán Felipe García; Luis Fletscher

doi:10.3390/info16110998

,

and

¹

Faculty of Natural Sciences and Engineering, Universidad de Bogotá Jorge Tadeo Lozano, Carrera 4 # 22-61, Bogotá 111321, Colombia

²

Contraloría General de la República, Carrera 69 No 44-35, Bogotá 111071, Colombia

³

Rotorr—Motor de Innovación, Av. El Dorado #44a-40, Bogotá 111071, Colombia

⁴

Department of Electronic Engineering, Faculty of Engineering, Universidad de Antioquia, Cll. 67 # 53-108, Medellín 050010, Colombia

Information2025, 16(11), 998;https://doi.org/10.3390/info16110998

Version Notes

Order Reprints

Abstract

Corruption in Colombia remains a critical barrier to development, institutional trust, and equitable access to public services, despite legislative efforts such as the Anti-Corruption Statute. This article explores the intersection between fiscal management and artificial intelligence (AI) as integrated strategies for enhancing transparency, accountability, and risk assessment in public administration. Drawing on theoretical frameworks and empirical data from 2020 to 2022, this study analyzes the scale and impact of corruption and the effectiveness of oversight mechanisms led by the Comptroller General of the Republic (CGR). A key innovation examined is the implementation of a GPT-based scoring model that automates the evaluation of internal accounting controls in 219 public entities. By leveraging AI to support fiscal audits, Colombia demonstrates a scalable approach to modernizing anti-corruption practices. The study concludes with policy recommendations that emphasize digital transformation, institutional strengthening, citizen engagement, and capacity building to improve fiscal governance and reduce corruption.

Keywords:

corruption; fiscal control; public administration; transparency; digital governance

1. Introduction

Colombia has experienced persistent challenges in tackling corruption despite adopting the Anti-Corruption Statute under Law 1474 in 2011 [1]. According to Monitor Ciudadano [2], 967 corruption cases were documented nationally between 2016 and 2020. Additionally, the Corruption Perceptions Index from Transparency International has remained stagnant between 36 and 39 from 2012 to 2021, underscoring limited progress in anti-corruption outcomes [3].

Corruption constitutes a multidimensional threat. Economically, it reduces productivity and deters both domestic and foreign investment [4]. Institutionally, it weakens public trust and undermines the legitimacy of state institutions. Socially, it exacerbates inequality by diverting public resources away from the most vulnerable populations [5]. The diversion of public funds from essential services like health, education, and infrastructure compromises fundamental rights and impedes social mobility.

Empirical studies show that the economic cost of corruption is considerable. For instance, Gómez and Gallón [6] estimated that aligning Colombia’s corruption control mechanisms with those of Chile during the 1990s could have increased the per capita GDP growth rate by 2.73 percentage points. Moreover, the presence of widespread administrative and political corruption continues to affect the effectiveness of public policies and the delivery of essential goods and services [7].

In the Ibero-American context, corruption remains a systemic challenge deeply intertwined with public management and accountability structures. Comparative research in Spain, Brazil, Mexico, and Chile reveals that the persistence of corruption is linked not only to legal gaps or weak sanctions but also to fragmented administrative coordination and insufficient civic oversight. Jimenez-Sanchez [8] highlights that integrity systems in Spain advanced only when linked to institutionalized accountability and merit-based bureaucratic reforms. In Brazil, Morgan Fullin Saldanha et al. [9] demonstrate that digital transparency and open-data platforms have strengthened the monitoring of fiscal decisions and public procurement, thereby improving trust in public management. Similarly, Teixeira & others [10] identify in Mexico that anti-corruption reforms often falter due to limited inter-agency collaboration and political interference despite progress in regulatory frameworks.

Beyond Latin America, comparative analyses such as those by Grindle [11] and Mungiu-Pippidi [12] emphasize that the ability to endure corruption control depends on long-term institutional investment and the creation of impartial, rule-based administrative systems. These findings reinforce the argument that Colombia’s ongoing fiscal and administrative modernization should not be seen in isolation but as part of a broader regional evolution toward integrated, evidence-based governance. By positioning fiscal management and artificial intelligence as complementary tools for accountability, this study contributes to the emerging debate on how technological innovation can strengthen institutional integrity and transparency across Ibero-American public sectors.

Although many of the control frameworks discussed (e.g., COSO [13], INTOSAI [14]) have Western origins, it is essential to incorporate governance models from the Global South to ensure relevance. For example, Brazil’s Tribunal de Contas da União (TCU) has pioneered transparency platforms and analytics-based oversight in a context marked by decentralized governance and complex local political economies. In China, the National Audit Office has integrated large-scale data-driven auditing within a centrally controlled system, highlighting trade-offs between capacity and rights protection [15]. By including these cases, our study situates Colombia’s experience within a broader spectrum of contexts and recognizes that institutional capacity, technological infrastructure and democratic accountability vary markedly across Global South jurisdictions.

In this context, fiscal control plays a pivotal role in Colombia’s public management. Its primary objective is to promote transparency, efficiency, and accountability in the use of public resources. Through institutions such as the Comptroller General of the Republic and other audit bodies, fiscal oversight enables the prevention and detection of corruption, ensuring that public funds are not misappropriated [16].

Efficient resource management also reinforces financial sustainability and maximizes the impact of investments in strategic sectors such as healthcare, education, and infrastructure. Robust fiscal oversight contributes to the effectiveness of social programs by ensuring that benefits reach the most vulnerable populations [16].

Ultimately, fiscal control is a cornerstone for strengthening democracy and sustainable development in Colombia. By curbing corruption and promoting responsible resource use, it helps build a more transparent, accountable, and trustworthy state for its citizens [16].

Recently, technological innovation, particularly the use of artificial intelligence (AI), has emerged as a powerful complementary tool to strengthen fiscal oversight. A notable example is the implementation of a GPT-based scoring model developed using Azure OpenAI Service. This model automates the evaluation of internal accounting control risks by assigning risk scores (from 1 to 10) based on established audit criteria. The tool is currently applied to responses from 219 public entities and incorporates criteria such as completeness, supporting evidence, and adherence to procedures. By standardizing and accelerating the audit process, AI enhances the precision, consistency, and scalability of risk assessment practices.

Despite significant institutional reforms and oversight mechanisms, Colombia’s public sector presents deep heterogeneity in administrative capacity, regulatory compliance, and oversight practices, particularly at the subnational level. According to [16] and empirical analyses by the Contraloría General de la República, many territorial entities show persistent deficiencies in internal control frameworks, audit readiness, and evidence-based decision-making [17]. These challenges are exacerbated by limited technical infrastructure, low professionalization of audit staff, and fragmented information systems, creating vulnerabilities in the fiscal oversight ecosystem.

The persistence of corruption indicators and stagnation in performance indices reveal a critical gap in the operational capacity of audit systems to address complex and large-scale irregularities in real time. While the role of fiscal oversight bodies such as the Comptroller General of the Republic is essential, the integration of advanced technological tools remains underexplored and underutilized. Recent developments in artificial intelligence (AI), particularly transformer-based models like GPT, offer a novel opportunity to enhance transparency, consistency, and predictive capacity in fiscal audits. However, academic literature and policy reports have yet to systematically examine the concrete implementation and institutional impacts of AI-assisted auditing mechanisms in the Latin American context. This paper addresses this gap by analyzing the design and deployment of an AI-based scoring model for risk assessment in Colombia, offering both a conceptual and empirical contribution to the discourse on fiscal innovation.

While artificial intelligence tools, especially Large Language Models, offer significant potential to augment fiscal oversight by enabling scalable analysis of unstructured data, their application also entails substantial risks. These include issues of algorithmic opacity, potential bias replication from training data, variation in interpretive performance across writing styles and institutional contexts and limited legal frameworks governing AI accountability in the public sector. Recognizing these risks, this study adopts a human-in-the-loop validation strategy and proposes safeguards for transparency, documentation, and expert feedback to ensure ethical and responsible use of AI in the audit domain.

The objective of this study is to assess how artificial intelligence can strengthen fiscal management frameworks to combat corruption more effectively in Colombia. Specifically, this research aims to

•: Analyze the conceptual foundations linking fiscal oversight and technological innovation;
•: Describe the development and implementation of a GPT-based scoring model for internal control audits;
•: Evaluate the practical implications of AI integration for institutional efficiency, transparency, and accountability in public management.

To guide this inquiry, the following research questions are posed: How can AI-based tools improve the objectivity, consistency, and efficiency of fiscal risk assessments? What institutional conditions are necessary for the successful deployment of AI in government audit processes? and What lessons can be drawn from the Colombian case for broader applications of AI in public oversight and anti-corruption strategies?

This integration of AI and fiscal control signals a paradigm shift in Colombia’s anti-corruption strategy, moving from reactive to predictive and preventive oversight. The objective of this article is to demonstrate how emerging technologies, when applied to fiscal management, constitute a powerful tool in the fight against corruption. It highlights successful initiatives implemented by the Comptroller General of the Republic, particularly the use of AI-based models for risk assessment and audit automation, as illustrative cases of institutional innovation and technological integration.

This paper is structured as follows: Section 2 presents the theoretical framework that contextualizes the relationship between corruption, fiscal oversight, and technological innovation. Section 3 describes the methodology used for the design and implementation of the AI-based scoring model. Section 4 analyzes the results obtained from the application of this model across public entities. Finally, Section 5 offers the conclusions and outlines directions for future research and institutional strengthening.

2. Theoretical Framework

2.1. Defining Corruption

Corruption is a complex and multidimensional phenomenon that undermines governance, economic performance, and social equity. According to Tanzi [18], corruption involves the deliberate violation of the principle of impartiality to obtain personal or related benefits. It generally requires that the individual possess some form of advantage or discretionary power, allowing them to deviate from established rules and ethical norms [19].

Corruption arises from structural deficiencies within public institutions, where conditions such as regulatory complexity, low public sector wages, and scarcity in service provision increase the likelihood of unethical behavior. As Tanzi explains [18], the combination of excessive bureaucratic discretion and weak institutional controls creates incentives for actors to seek private gain at the expense of the public interest. This perspective aligns with rent-seeking theory, which views corruption as a rational strategy used by individuals to extract benefits in environments marked by regulatory inefficiencies and limited oversight.

From another angle, Begovic [19] emphasizes the importance of institutional quality in understanding corruption. When legal systems are fragile, enforcement mechanisms are inconsistent, and there is limited risk of detection or punishment, corruption becomes a viable and recurring strategy. In such environments, unequal access to public services may also reinforce patronage networks and informal arrangements, particularly in societies marked by economic disparity. These dynamics highlight the need for systemic reforms aimed at promoting meritocracy, reducing discretion in administrative procedures, and addressing the social conditions that perpetuate corrupt practices.

2.2. Typologies and Theoretical Approaches

Corruption can be categorized through various theoretical lenses. The principal-agent theory views corruption as a result of asymmetric information and misaligned incentives between public officials and oversight institutions [20]. However, Begovic [19] notes that this theory does not adequately address political corruption. Alternative perspectives based on conflict economies and organized crime depict corruption as a mechanism of internal cohesion in authoritarian regimes, where rents are distributed to secure loyalty and prevent dissent.

Three main types of corruption are widely recognized:

•: Petty corruption (without theft): Bribes paid to expedite legally entitled services.
•: Administrative corruption: Bribes used to bypass rules, undermining public policy effectiveness.
•: State capture: Private interests influencing the formulation of laws and regulations through illicit means, differing from legitimate lobbying.

Additionally, corruption may be centralized, monopolized by elites, or decentralized, involving multiple actors with low state control.

The distinction between “state capture” and “legitimate lobbying” is not always clear-cut in practice. In Latin America, elite groups often exploit legal lobbying channels to shape regulation in their favor, effectively narrowing the public interest in ways functionally equivalent to capture [21]. As such, our conceptualization treats lobbying and capture as points on a continuum of regulatory influence, recognizing that both may erode institutional integrity even when formal legality is maintained. This nuance is especially relevant when designing AI-assisted oversight mechanisms: they must be sensitive not only to overt corrupt acts, but also to structural patterns of influence and regulatory subversion.

2.3. Empirical Evidence in Colombia

Corruption in Colombia exhibits both systemic and institutionalized characteristics. According to Ayala García et al. [3], disciplinary sanctions by the Inspector General’s Office (PGN) have increased since 2011, with approximately 65% targeting members of the security forces. In this way, departments created after the 1991 Constitution, such as Arauca, Casanare, and Putumayo, show the highest number of sanctions per capita. Additionally, convictions for corruption in the public sector, which account for 96% of the total, peaked in 2016, with Caldas leading per capita. The Comptroller General of the Republic reported its highest fiscal responsibility figure in 2017, primarily due to the Reficar case [22].

Between 2016 and 2022, corruption-related irregularities in Colombia amounted to COP 137.65 billion, with COP 21.28 billion lost and COP 9.08 billion recovered [23]. To contextualize these figures economically, Colombia’s Gross Domestic Product (GDP) in 2022 was approximately COP 1401 billion [24]. This means that

•: The total amount compromised by corruption during the 2016–2022 period represents approximately 9.8% of Colombia’s GDP in 2022;
•: The amount effectively lost due to corruption is equivalent to 1.5% of the national GDP;
•: The amount recovered corresponds to approximately 0.65% of GDP.

Although 1.5% may appear modest in relative terms, it reflects a substantial loss of public resources, comparable to the annual budget of major national social or infrastructure programs. The fact that nearly 10% of the national product was exposed to corruption risk highlights the systemic nature and economic impact of governance vulnerabilities.

These actions impacted over 14.53 million citizens, with children and adolescents comprising 24.65% of those affected [23]. In this context, the most affected institutions include national government entities, mayoral offices, and regional governments. Administrative corruption represented 49.4% of cases, political corruption 23.7%, procurement-related corruption 31.9%, and corruption in security sectors 20.5%. Additionally, 53 cases were linked to environmental harm.

In 2021 and 2022 alone, 59 corruption cases endangered COP 18 billion in public resources, with COP 3.6 billion lost and only COP 1.2 billion recovered. Road infrastructure and subsidized housing projects were among the most affected sectors.

2.4. Fiscal Management as a Governance Instrument

According to Article 3 of Law 610 of 2000 [25], fiscal management refers to the economic, legal, and technological activities carried out by public and private actors to administer public resources, ensuring their proper use, investment, and disposal in accordance with the principles of legality, efficiency, equity, transparency, and environmental sustainability. These principles are not only normative guidelines but also essential criteria for evaluating public sector performance and aligning financial practices with democratic values and sustainable development objectives.

Effective fiscal management promotes responsible budgeting, curtails excessive deficits, and optimizes resource allocation. It ensures that public investment aligns with strategic national goals and supports inclusive economic growth. The efficient use of fiscal tools enables governments to stabilize macroeconomic conditions, expand infrastructure, and improve essential services such as health, education, and social protection [26]. Moreover, fiscal discipline helps mitigate risks associated with public debt and inflation, contributing to long-term financial sustainability.

In the Colombian context, fiscal management has gained renewed attention as a strategic pillar in the fight against corruption. The Comptroller General of the Republic, as the country’s highest fiscal oversight authority, plays a central role by conducting audits, verifying compliance with legal norms, and evaluating the efficiency and effectiveness of public spending [16]. The CGR’s efforts extend beyond financial oversight to include performance audits and citizen engagement mechanisms, thereby reinforcing accountability and transparency. This multi-dimensional approach recognizes that good fiscal governance requires not only sound financial controls but also mechanisms that detect irregularities and ensure redress.

The integration of fiscal management with digital technologies and data-driven tools represents a significant advancement in improving oversight and transparency. Automated auditing systems, public expenditure tracking platforms, and open budget initiatives enhance real-time monitoring and reduce opportunities for manipulation and fraud [4]. As demonstrated by recent innovations implemented by the CGR, such as the deployment of artificial intelligence to assess internal control risks, technology can complement traditional oversight mechanisms, increase institutional efficiency, and strengthen the rule of law. Ultimately, fiscal management serves not only as a technical instrument of public finance but also as a foundational component of democratic governance and institutional integrity.

2.5. International Frameworks in Public Management and Internal Control

To strengthen public sector performance, multiple international frameworks have been developed to guide governments in achieving efficient, transparent, and accountable management. These models, while varying in structure, share common goals: improving institutional effectiveness, promoting ethical governance, and ensuring that public policies deliver measurable value to citizens. Colombia’s public management models, such as Integrated Planning and Management Model (Modelo Integrado de Planeación y Gestión: MIPG) and the Standard Model of Internal Control (Modelo Estándar de Control Interno: MECI), align with many of these global standards. This section introduces key international frameworks that provide relevant comparative references for the Colombian case.

2.5.1. The Common Assessment Framework (CAF)

The Common Assessment Framework (CAF) is a European tool designed to promote quality management in the public sector. Based on the European Foundation for Quality Management (EFQM) excellence model, the CAF facilitates organizational improvement through self-assessment, results orientation, leadership development, and stakeholder engagement [27]. It advocates the Plan-Do-Check-Act (PDCA) cycle and supports organizations in identifying weaknesses and sharing best practices across institutions. Unlike sector-specific approaches, the CAF offers an integral evaluation of public institutions, making it a suitable benchmark for comprehensive models like MIPG.

2.5.2. The Whole-of-Government Approach (WOG)

Emerging as a response to the fragmentation produced by the New Public Management (NPM) reforms, the Whole-of-Government approach emphasizes inter-agency collaboration, horizontal coordination, and integrated policy-making. It fosters stronger central governance, regulatory flexibility, and shared service structures [28]. Countries such as Australia, New Zealand, and the UK have adopted WOG principles to improve coherence across government sectors. The Colombian MIPG, with its focus on articulation between dimensions and public value, shares conceptual synergies with the WOG strategy, especially in addressing fragmentation and promoting cross-functional accountability.

2.5.3. The EFQM Excellence Model

The EFQM model views organizations as dynamic systems that create sustainable value through purposeful actions. It is structured around three key questions: Why does the organization exist? How does it intend to fulfill its mission? What results has it achieved or aims to achieve? [29]. Unlike prescriptive models, EFQM encourages leadership, innovation, and future-readiness in navigating complexity and delivering outcomes. Its emphasis on adaptive capacity and stakeholder inclusion complements MIPG’s focus on cultural transformation and citizen-centered services.

2.5.4. COSO—Internal Control–Integrated Framework

Developed in the United States by the Committee of Sponsoring Organizations of the Treadway Commission (COSO), this framework provides a structured method for implementing and evaluating internal controls. The 2013 update expanded its application to include sustainability and non-financial reporting, promoting ethical governance and risk mitigation [13]. Its five core components—control environment, risk assessment, control activities, information and communication, and monitoring—align closely with MECI’s architecture, reinforcing the relevance of COSO as a foundational reference for public internal control systems.

2.5.5. INTOSAI Guidelines for Internal Control

The International Organization of Supreme Audit Institutions (INTOSAI) offers guidelines tailored to the public sector, based on COSO principles but adapted to the specific requirements of government auditing and control. The INTOSAI GOV 9100 framework incorporates ethical standards, fraud prevention, and ICT controls, promoting internal audit systems that are proactive, comprehensive, and accountable [14]. MECI’s three lines of defense and institutional risk-based design reflect the spirit of INTOSAI’s recommendations, strengthening Colombia’s capacity for oversight and continuous improvement.

2.5.6. European Commission Internal Control Framework

Adopted in 2017, this framework integrates COSO principles into the European governance context. It ensures that EU institutions operate with transparency, efficiency, and integrity, and it emphasizes ethical culture, accountability, and control over digital systems [30]. Key elements include structured risk assessments, robust IT security protocols, and continuity planning, all increasingly vital for modern public administration and well aligned with MECI’s evolution toward proactive control.

2.5.7. The Orange Book (UK)

The Orange Book, issued by HM Treasury in the UK, offers principles and operational guidance for embedding risk management in the public sector. It promotes an adaptive, evidence-based approach to institutional planning, aligning risk control with broader governance standards. The “comply or explain” principle ensures transparency in risk responses and justifications for deviations [31]. This guidance enriches the discourse on institutional integrity and accountability, complementing Colombia’s emerging efforts to modernize fiscal and administrative oversight.

2.6. Integrated Public Management and Internal Control Models in Colombia

In Colombia, public management is guided by structured frameworks designed to enhance transparency, efficiency, and accountability across government institutions. Two foundational models in this context are the Integrated Planning and Management Model (MIPG) and the Standard Model of Internal Control (MECI).

The MIPG provides an overarching model to integrate strategic planning, quality management, and internal control. It is structured around seven interdependent dimensions: human talent, strategic direction and planning, results-based management, evaluation, information and communication, knowledge and innovation, and internal control. In this way, the model promotes coordination between public institutions and a culture focused on integrity, public value, and evidence-based decision-making [32]. By aligning institutional efforts with national development plans and citizen needs, MIPG enhances performance monitoring and fosters participatory governance. Its emphasis on quality service delivery and institutional transparency is directly aligned with anti-corruption goals.

Complementarily, MECI serves as the internal control framework that ensures the legality, efficiency, and effectiveness of public management. It is built upon principles such as self-control, self-regulation, and self-management, which promote ethical and proactive behavior among public servants [33]. MECI is structured around five components—control environment, risk assessment, control activities, information and communication, and monitoring—that support risk prevention and continuous improvement. These mechanisms are critical for identifying institutional vulnerabilities and implementing timely corrective actions.

Together, MIPG and MECI represent a holistic governance ecosystem that enables entities like the Comptroller General’s Office to carry out more effective fiscal oversight. Their articulation ensures that planning, execution, and evaluation processes are not isolated, but interconnected, providing a strong institutional basis for integrating AI-based audit technologies. For example, the risk assessment protocols embedded in MECI can inform the design of AI scoring models, helping define criteria and thresholds for fiscal risk. Similarly, MIPG’s emphasis on evaluation and public value aligns with the goals of automating and scaling performance monitoring across decentralized institutions.

By embedding technologies such as GPT-based models within the institutional logic of MIPG and MECI, Colombia advances toward a whole-of-government approach, where technological innovation strengthens rather than displaces traditional mechanisms of democratic accountability and public service ethics [28,34].

To better understand the similarities and differences between Colombia’s public management and control models and internationally recognized frameworks, Table 1 presents a comparative overview of key elements from both national (MIPG and MECI) and international (COSO, CAF, INTOSAI, and others) models. This comparison highlights shared principles such as risk-based decision-making, accountability, continuous improvement, and citizen-centered governance, while also identifying distinctive features in structure, implementation scope, and alignment with national development plans. The table provides a conceptual foundation for evaluating how these models support transparency, efficiency, and integrity in public administration, core goals in the use of AI-based tools for fiscal oversight.

Table 1. Comparative Overview of National and International Public Management Models.

2.7. AI-Powered Fiscal Control

Technological innovation, particularly artificial intelligence, has become a powerful ally in enhancing fiscal control. International experiences show that AI can strengthen oversight mechanisms by introducing automation, predictive analytics, and pattern recognition into audit processes [4,35].

In Colombia, the CGR has implemented a GPT-based scoring model using the Azure OpenAI Service to automate the assessment of internal accounting controls across 219 public entities. This model evaluates qualitative responses to nine control-related questions, assigning a risk score from 1 (high risk) to 10 (low risk). The evaluation is based on criteria such as the completeness of responses, supporting documentation, and the use of proper control procedures. By integrating AI, the CGR has enhanced audit consistency, coverage, and efficiency, supporting a proactive approach to identifying and mitigating risks in public management.

2.8. AI Governance and Algorithmic Ethics

The integration of traditional governance and control frameworks, such as COSO, INTOSAI, CAF, and the Orange Book, with emerging approaches to digital and algorithmic governance, represents a new phase in public-sector management. These international models have historically emphasized accountability, transparency, and continuous improvement; however, the increasing use of artificial intelligence (AI) and data-driven decision-making requires extending their principles to encompass algorithmic transparency, explainability, and fairness.

Recent empirical research highlights that the adoption of AI systems in public administration is reshaping accountability architectures and introducing new governance challenges. Vephkhvia [36] argues that algorithmic decision-making must remain subject to democratic oversight and human review to prevent bias and opacity. Similarly, European Commission [37] and Papagiannidis [38] guidelines on trustworthy AI emphasize that public institutions should ensure traceability, risk management, and ethical compliance throughout the AI lifecycle. Kokina [39] further points out that “algorithmic accountability” requires embedding auditability mechanisms into digital government systems, aligning them with the same control standards defined in COSO and INTOSAI frameworks.

Empirical evidence from Ibero-American contexts reinforces this need. Morgan et al. [9] show that digital public services in Brazil have improved transparency but also demand stronger algorithmic accountability policies to safeguard citizens’ rights. Collectively, these findings suggest that the principles of internal controls, such as those established by COSO’s control environment, INTOSAI’s ethical standards, and CAF’s continuous improvement, must now evolve into what can be described as algorithmic governance systems, integrating ethical AI principles into traditional accountability structures.

This theoretical extension positions fiscal management as not only a financial oversight mechanism but also a digital integrity system that ensures that AI-based audit processes remain transparent, auditable, and consistent with democratic values.

2.9. AI Adoption Challenges in Public-Sector Auditing

While artificial intelligence (AI) holds considerable promise to enhance the efficiency and coverage of fiscal oversight, deploying AI in public-sector auditing entails significant limitations and risks. Research has demonstrated that citizens experiencing algorithmic decision-making in governance contexts often report a sense of loss of control and agency, so-called “digital anxiety” [40]. Moreover, the field of AI auditing is still emergent, with questions around transparency, contestability, and accountability remaining largely unresolved [41]. In the context of auditing, reliance on AI systems without robust human-in-the-loop governance may give rise to biases embedded in training data, reduce the scope for explanation or appeal, and erode institutional trust [42]. Accordingly, our study adopts a cautious stance: AI is positioned as a complementary tool for risk identification rather than a substitute for professional judgment or legal audit authority.

3. Methodological Approach

The methodological design adopted in this study centers on the implementation of an AI-assisted scoring system to evaluate internal accounting control risks across Colombian public entities. The model was developed using the Azure OpenAI GPT service and applied to qualitative audit observations submitted by 219 entities under the jurisdiction of the Comptroller General of the Republic (CGR).

The process begins with the collection of input data structured in an Excel file, where each entity provides narrative responses to nine standardized audit questions. These observations, written in free text, serve as the raw material for evaluation. These questions correspond to a self-assessment of the internal accounting control system established by the General Accounting Office of Colombia (Contaduría General de la Nación), in which public entities respond under two components: a quantitative and a qualitative one. Currently, only the quantitative component is used to determine the internal control rating issued by the General Accounting Office. However, the Comptroller General of the Republic has observed and documented that the quantitative responses provided in these self-assessments often diverge significantly from the findings of institutional audits of internal control systems.

To ensure consistency and contextual relevance in the AI-based evaluation, a systematized prompt was designed. This prompt includes the original question, evaluation criteria (such as completeness, supporting documentation, and methodological rigor), and illustrative examples for each score level (1 to 10).

Once the prompts are constructed, they are processed through the GPT model, which analyzes the text and assigns a risk score to each observation. The output consists of a numerical score, where 1 represents high risk and 10 represents low risk. The process is fully automated using a Python 3.10 script and OpenAI’s API, which enables scalability and replicability across entities and audit cycles. To ensure the reliability and accuracy of the scoring model, the system was evaluated by expert auditors from the Comptroller General’s Office through multiple blind testing rounds. In these tests, auditors independently assessed randomly assigned observation texts—without knowledge of their institutional origin, using the same 1 to 10 risk scale. The results were then compared with the model’s output, and after several iterations, the model was calibrated to ensure consistency and alignment with expert judgments.

The results are exported into an Excel spreadsheet for post-processing validation. Each score is reviewed against institutional benchmarks and anomalies are flagged for further human inspection. This hybrid approach, combining AI evaluation with traditional auditing methods, enhances the speed, accuracy, and objectivity of fiscal oversight.

Figure 1 illustrates the five key stages of this AI-powered evaluation pipeline: (1) input data collection; (2) prompt construction; (3) GPT model evaluation; (4) scoring output; and (5) results validation.

Figure 1. AI-assisted fiscal risk scoring pipeline. The workflow comprises five sequential stages: (1) Input data collection—ingestion of entities’ narrative responses in Excel format; (2) Prompt construction—assembly of a standardized template combining the audit question, scoring rubric, and response text; (3) GPT evaluation—model inference via Azure OpenAI to generate a provisional score; (4) Scoring output—export of numeric risk scores (1–10) for each observation; and (5) Validation and quality assurance—automated checks and expert review to detect anomalies and calibrate the model.

3.1. Study Design and Rationale

This study employs a mixed-methods, explanatory design to investigate how artificial intelligence (AI) can enhance fiscal management and internal control in Colombian public entities. The implementation combines (i) documentary and survey-based evidence on internal control practices and (ii) an AI-assisted scoring pipeline that transforms qualitative audit observations into standardized risk scores. The goal is to operationalize fiscal management, as mandated by the Comptroller General of the Republic (CGR), through scalable, data-driven risk detection that supports preventive, rather than purely reactive, oversight.

3.2. Dataset Description

The dataset analyzed in this study was compiled by the Contraloría General de la República (CGR) as part of the “Estudio de Efectividad del Control Fiscal Interno” [17]. It integrates data from three primary components:

(1): Institutional Surveys. A structured questionnaire was distributed to 219 public entities, including ministries, departmental comptroller offices, and municipal audit units. Respondents included heads of internal control, planning, and financial oversight offices. The survey gathered quantitative and qualitative information on control practices, operational risk management, and perceived challenges in fiscal oversight.
(2): Self-Assessments of Internal Accounting Control. Narrative responses were submitted by public entities to the Contaduría General de la Nación (CGN) via the Sistema Consolidador de Hacienda e Información Pública (CHIP). Each entity provided textual descriptions addressing nine standardized questions about their internal control and fiscal risk management processes. These qualitative data were the basis for the AI-driven scoring model, which automatically assigned risk scores on a 1–10 scale.
(3): Fiscal Audit Findings and Sanctions. Aggregated data on fiscal responsibility proceedings, recovery of misused public funds, and the territorial distribution of audit results between 2020 and 2022 were collected. These records were used to triangulate and validate the AI-generated scores against real-world audit outcomes.

The full dataset includes approximately 1971 narrative entries, corresponding to the responses of 219 entities, complemented with quantitative indicators of fiscal responsibility processes and categorical metadata such as institutional type, geographical level, and sectoral classification. The AI model (based on the GPT transformer architecture via Azure OpenAI Service) analyzed these texts through a structured prompt containing the original question, evaluation criteria (e.g., completeness, supporting evidence, procedural rigor), and example scoring anchors.

Expert auditors from the CGR independently reviewed a stratified random subset of 300 entries through blind validation—they did not know which entities the responses belonged to. Inter-rater consistency between human auditors and the AI model was evaluated using Pearson’s correlation coefficient (r = 0.71) and Mean Absolute Error (MAE = 1.46), confirming adequate reliability for interpretive analysis.

The 2020–2022 timeframe corresponds to the most recent cycle of fiscal control evaluations under the CGR’s Plan Estratégico 2022–2026, ensuring that the dataset reflects current governance conditions and digital audit practices in Colombia.

3.3. Population, Sample, and Data Sources

As stated previously, the dataset used for model validation comprised qualitative responses from 219 Colombian public entities participating in the Internal Accounting Control System Self-Assessment Survey coordinated by the General Accounting Office. These entities voluntarily submitted complete qualitative responses to the open-ended section of the form, which complement the quantitative indicators currently used for official internal control evaluation. The 219 analyzed cases were selected as part of a pilot application to assess the feasibility and reliability of an AI-driven scoring mechanism [17]. The sample is a representation of the institutional population, covering entities from national, departmental, and municipal levels.

It is important to note that this pilot dataset does not achieve full territorial representativeness, particularly for small and rural municipalities where self-assessment reporting remains inconsistent. However, this limitation aligns with the exploration purpose of the study, which seeks to validate methodological soundness, scalability, and alignment between AI-generated scores and expert auditor evaluations. Future research will extend the model’s implementation to subsequent reporting cycles, enabling broader territorial coverage and longitudinal analysis across Colombia’s fiscal control ecosystem.

According to the above, the empirical corpus includes qualitative observations submitted by 219 public entities under the jurisdiction of the CGR. Each entity provided responses to nine standardized audit questions focused on internal accounting control. These responses were submitted in free-text format and compiled into Microsoft Excel files. Additional information, such as traditional audit outcomes and institutional benchmarks, was utilized for post hoc validation and triangulation. Importantly, no personally identifiable information was processed.

3.4. Data Ingestion and Pre-Processing

Excel workbooks were merged into a single relational dataset, with each row representing a distinct (entity, question) pair. To preserve semantic content while enhancing model robustness, minimal pre-processing was undertaken, which included

•: Normalization to UTF-8 and elimination of stray control characters;
•: Standardization of empty or “no answer” cells, which were flagged as missing and assigned a default risk score (refer to Section 3.6 in the document);
•: Tagging of metadata, encompassing entity type, level (national or territorial), and sector.

3.5. AI-Assisted Scoring System

The methodological design focuses on implementing an AI-assisted scoring system to assess internal accounting control risks in Colombian public entities. This model was developed using the Azure OpenAI GPT service and applied to the qualitative audit observations previously described. The goal is to enhance consistency, coverage, and timeliness in fiscal risk assessments, which are essential components of effective fiscal management.

Prompt Engineering and Scoring Rubric

A structured prompt template was developed to ensure contextual relevance and reproducibility. Each prompt concatenated the original audit question with a rubric that specified clear evaluation criteria—namely, the completeness of the response, the presence of supporting documentation, the rigor of procedures and methodologies, and the clarity of any corrective actions described. To anchor the ordinal scoring, the template also included illustrative benchmarks for every value on the 1–10 scale: a score of 1 signaled high risk (insufficient or no evidence of control), whereas a score of 10 indicated low risk (robust, well-documented, and audited control).

Formally, let

y_{i j}

denote the narrative answer of entity

i

to question

j .

The prompt

p_{i j}

is a deterministic function of

y_{i j}

and the rubric

R

:

p_{i j} =; p (y_{i j}, R)

The GPT model returns a scalar score

s^{i j} \in \{1, \dots, 10\}

.

All prompts were generated and submitted programmatically using Python 3.x to the Azure OpenAI API. The pipeline consists of five stages (Figure 1):

Model outputs were exported to Excel/CSV to preserve an auditable trail and facilitate analyst review. Validation followed a three-tier procedure. First, rule-based checks coerced or flagged non-integer or out-of-range values, while empty responses were systematically assigned a score of 1 (maximum risk) to prevent silent data gaps. Second, expert comparison was conducted on a subsample scored independently by internal control specialists blinded to entity identity; concordance between GPT and human scores was quantified using Mean Absolute Error (MAE) and Pearson’s correlation, and systematic deviations (e.g., persistent underestimation in specific question types) were corrected through least-squares calibration. Third, institutional benchmarking contrasted AI-derived scores with established audit outcomes, responsibility rulings, and compliance indicators to assess criterion validity. Any discrepancies or anomalous patterns detected across these layers triggered targeted human review, thereby ensuring consistency with CGR standards and legal mandates.

3.6. Evaluation Metrics

Primary accuracy was quantified using the mean absolute error (MAE) between GPT-generated scores and expert ratings:

MAE = \frac{1}{N} \sum_{k = 1}^{N} |\hat{s_{k}} - s_{k}|,

where

\hat{s_{k}}

and

s_{k}

denote the Al and expert scores, respectively, for observation. Linear agreement was assessed with Pearson’s correlation coefficient,

r = \frac{\sum_{k = 1}^{N} (\hat{s_{k}} - \bar{\hat{s}}) (s_{k} - \bar{s})}{\sqrt{\sum_{k = 1}^{N} {(\hat{s_{k}} - \bar{\hat{s}})}^{2} \sqrt{\sum_{k = 1}^{N} {(s_{k} - \bar{s})}^{2}}}},

providing both effect size and direction of association. To characterize distributional properties and heterogeneity across sectors and government levels, descriptive statistics (mean, median, and interquartile range, IQR) of the score vectors were reported. Additionally, the empirical percentiles of absolute discrepancies,

P_{q} (|\hat{s_{k}} - s_{k}|)

(e.g., the 90 th percentile) were examined as a proxy for worst-case deviation. Finally, high-risk narratives (low scores) were subjected to qualitative thematic analysis to elicit recurrent fiscal-risk patterns—such as procurement weaknesses, budget-execution gaps, and deficiencies in asset management—thereby contextualizing quantitative divergences within substantive governance concerns.

3.7. Ethical, Legal, and Governance Considerations

All data are administrative and fall under CGR’s legal oversight mandate. No personal or sensitive data were processed. The use of AI adhered to principles of transparency and accountability: prompts, scoring rules, and code are documented to enable auditability. The hybrid (AI + human) approach mitigates risks of algorithmic opacity by retaining expert oversight at critical checkpoints.

4. Results

4.1. Corpus and Score Distributions

A total of 219 entities, represented through nine questions, were analyzed, resulting in 1971 observations. The GPT scores exhibited a range from 1 to 10, demonstrating a right-skewed distribution. This indicates a predominantly moderate to low risk profile, alongside a subset of high-risk cases. National entities generally reported higher median scores in comparison to territorial entities, while the interquartile range (IQR) displayed a greater dispersion among the latter, suggesting a heterogeneous maturity level in their internal controls.

4.2. Human–AI Agreement and Calibration

In the blind expert subsample, the mean absolute error (MAE) between the scores generated by GPT and those assessed by human experts was relatively low (MAE = 1.814), indicating a close alignment in absolute terms. Moreover, Pearson’s correlation confirmed a strong and statistically significant linear concordance between both evaluations (r = 0.7052, p < 0.0000000002), reinforcing the consistency of the model’s relative scoring patterns. To further refine the agreement, a least-squares post hoc calibration was applied, which successfully reduced systematic deviations—particularly in items requiring extensive evidence, while preserving the ordinal ranking of responses. This suggests that GPT’s scoring not only correlates strongly with expert judgments but can also be systematically adjusted to achieve higher fidelity without compromising the comparative structure of the assessments.

4.3. Sectoral Patterns and Recurring Risk Themes

Administrative departments and ministries manifested the highest central scores, while smaller territorial entities and decentralized agencies were predominantly classified within the lower quartiles. Qualitative analysis of the narratives associated with lower scores identified recurring deficiencies in procurement planning, budget execution monitoring, and asset management. These deficiencies were frequently characterized by the absence of established procedures, insufficient documentation, and ineffective follow-up mechanisms.

4.4. Anomalies, Composite Index, and Robustness Checks

Rule-based screening identified a limited number of anomalies, including non-integer outputs and rubric inconsistencies. The majority of these anomalies pertained to genuinely inadequate controls rather than errors within the model; however, some necessitated prompt refinement and re-scoring. The integration of AI scores with institutional indicators facilitated the development of a composite Internal Control Effectiveness Index (ICEI), which consistently ranked entities across various robustness tests (including alternative prompts, outlier trims, and percentile loss functions), thereby underscoring the stability and reliability of this methodological approach. Rule-based screening flagged a small set of anomalies (non-integer outputs, rubric inconsistencies), most of which reflected genuinely poor controls rather than model error; the remainder prompted prompt refinement and re-scoring. Integrating AI scores with institutional indicators yielded a composite Internal Control Effectiveness Index (ICEI) that consistently ranked entities across robustness tests (alternative prompts, outlier trims, percentile loss functions), underscoring the stability of the approach.

The mixed-methods design combines qualitative and quantitative evidence through an AI-driven transformation process. Each narrative audit observation, initially a qualitative text, was semantically evaluated and assigned a numerical score (1–10) by the GPT-based model, converting descriptive assessments into structured quantitative data. Figure 2 presents a boxplot summarizing the dispersion of these scores across government levels, revealing both the central tendency and variability of internal control strength, while Figure 3 displays a heatmap that visually represents thematic diversity by linking individual question narratives to recurring risk dimensions such as asset management, reconciliation, and documentation.

Figure 2. Boxplot of three-year average AI-derived internal control scores by level of government (National vs. Territorial). Boxes show the interquartile range (IQR), horizontal lines denote medians, whiskers extend to 1.5 × IQR, and points beyond are outliers. Higher values indicate stronger internal accounting control.

Figure 3. Heatmap of AI-derived question scores (2020–2022). Rows (Q1–Q9) correspond to the self-evaluation items, columns to fiscal years (2020–2022). Cell colors encode the score magnitude (1–10), with higher values indicating stronger internal accounting control. Numeric labels inside each cell show the exact score to aid comparison across questions and years.

To further confirm these findings, Figure 3 depicts the overall distribution of average internal control scores for the 219 entities, distinguishing national and territorial levels. The histogram confirms a right-skewed pattern: approximately 37% of territorial entities scored below 7.0, indicating moderate-to-high control risk, whereas national agencies clustered around a mean of 8.4. These results align with official data from the Comptroller General of the Republic, which report over 3200 fiscal findings and total public losses exceeding COP 1.2 billion between 2019 and 2022. Together, these quantitative indicators and qualitative themes highlight the persistent challenges of corruption, limited audit capacity, and uneven institutional performance across subnational governments, while exemplifying how qualitative text interpretation and quantitative pattern analysis converge in the study’s mixed-methods framework.

The distribution illustrated in Figure 4 provides further evidence of the differences in internal control performance among Colombian public entities. The histogram shows a clear distinction between national and territorial administrations. National entities consistently achieved higher internal control scores (mean = 8.9) and exhibited lower variability, indicating well-developed oversight frameworks and standardized audit procedures. In contrast, territorial entities clustered around lower mean values (mean = 6.9), with a significant number scoring below the threshold of 7.0, approximately 37% of the sample. This suggests greater exposure to fiscal risks and deficiencies in local audit capacity. These quantitative results support the findings from the qualitative analysis of audit narratives, which indicated that territorial entities frequently displayed weaknesses in documentation, reconciliation, and asset management practices. The alignment of numerical data with textual observations highlights the benefits of a mixed-methods approach, exposing both the structural disparities in administrative capacity and the ongoing vulnerabilities within subnational fiscal control mechanisms.

Figure 4. Distribution of AI-derived internal control scores across 219 public entities (2020–2022). National entities are represented in blue, while territorial entities are shown in orange. Dashed vertical lines indicate the average scores for each group (National = 8.9; Territorial = 6.9). The distribution is right-skewed, suggesting that most entities have medium to high control strength. However, there is significant concentration at the lower end, with approximately 37% of territorial entities scoring below 7.0, highlighting persistent weaknesses in the internal control systems at the subnational level.

5. Discussion

The findings indicate that an AI-assisted scoring pipeline can standardize and scale the assessment of internal accounting controls across a diverse range of public entities. This approach operationalizes fiscal management as a preventive, risk-oriented practice. By converting narrative self-evaluations into comparable numerical indicators, the method addresses a longstanding gap in Latin American governance research: the lack of systematic, cross-entity measures of internal control quality.

The strong agreement with expert judgments, along with stable rankings in sensitivity analyses, suggests that large language models, when integrated into a transparent rubric and reviewed by humans, can enhance institutional expertise rather than replace it. The observed sectoral and territorial disparities in scores align with previous findings on uneven administrative capacity, reinforcing the need for differentiated support strategies rather than one-size-fits-all compliance mandates.

However, the study also reveals structural and methodological limitations that affect its generalizability. First, the reliance on self-reported narratives introduces selection and reporting biases that no scoring model, whether human or AI, can completely eliminate. Second, the design and calibration of prompts inevitably embed normative assumptions about what constitutes “good control.” While least-squares adjustments can reduce systematic deviations, they may also obscure context-specific practices that fall outside the rubric. Third, although the composite Internal Control Effectiveness Index incorporates external audit outcomes, the causal relationship remains unresolved: higher or lower AI scores correlate with fiscal findings but do not confirm that stronger internal controls prevent corrupt practices, especially in environments with fluctuating political incentives and enforcement capacities.

5.1. Territorial Disparities and Subnational Administrative Capacity

The results of the AI-based scoring reveal notable territorial heterogeneity across Colombia’s departments. Departments such as Quindío, Huila, and Santander exhibit consistently high audit scores, suggesting the presence of robust internal control practices. In contrast, entities located in regions such as Amazonas, Vaupés, and Chocó tend to score significantly lower, indicating elevated fiscal risk and weaker institutional oversight.

This pattern aligns with broader evidence from the CGR’s study on internal control effectiveness, which shows that in 2021 only 31.2% of departmental entities and 36.9% of municipal entities were classified as having “satisfactory” performance in internal control systems. The same study highlights that a significant number of subnational governments continue to operate without certified internal control staff and face persistent issues in ensuring the quality, completeness, and traceability of accounting records [17].

From a policy standpoint, these insights support the need for differentiated oversight strategies tailored to the specific challenges of each department. For example, departments such as Chocó, Vaupés, and Vichada—identified in the CGR’s analysis as having elevated fiscal risk—could benefit from capacity-building programs, streamlined digital reporting mechanisms, and dedicated technical assistance. Conversely, regions with consistently higher scores could serve as reference models or even be empowered to support peer-to-peer learning exchanges within their region.

These findings echo conclusions from the literature on subnational administrative capacity in Latin America. For instance, Falleti [43,44] have emphasized how decentralization reforms in Colombia granted political and fiscal autonomy to local governments without ensuring a proportional strengthening of their technical and administrative capacities. This asymmetry often results in fragmented accountability systems and a variable capacity to implement national control standards [45].

Moreover, the territorial differences observed in our study may also be influenced by structural inequalities in public investment and human capital. Less developed regions tend to lack skilled personnel, have greater turnover among public officials, and operate with limited access to digital infrastructure, all of which hinder the implementation of internal control frameworks.

The AI model’s ability to capture these latent institutional factors through textual analysis reinforces its value as a diagnostic tool. By systematically scoring free-text responses from self-assessments, it reveals gaps not only in compliance but in the very articulation of internal control practices. This insight is critical for tailoring audit strategies to the specific needs and capabilities of subnational entities.

Additionally, the AI-driven scoring methodology provides a scalable and replicable mechanism for risk stratification across thousands of narrative responses, supporting proactive interventions before audit findings escalate. This opens opportunities to enhance early warning systems within Colombia’s national fiscal control ecosystem.

5.2. Linking AI-Assisted Oversight to Corruption Mitigation

This study does not claim that AI alone will eradicate corruption. Instead, we argue that AI-assisted auditing can reduce the window of discretion, increase transparency, and improve early detection of risk patterns, thereby contributing to a narrower opportunity space for corrupt behavior [46]. Effective impact requires alignment with broader reforms—such as strengthened public integrity systems, empowered Supreme Audit Institutions, and institutionalized follow-up mechanisms [47]. The evidence presented suggests that narrative-based scoring via AI can bring greater consistency and coverage to internal control self-evaluations but must be embedded within a systemic governance architecture in order to contribute to meaningful anti-corruption outcomes.

5.3. Limitations

This study is not without limitations. First, the sample of 219 evaluated entities, while diverse, does not fully represent the entire landscape of Colombia’s public entities. The focus was primarily on those that participated in the CGR pilot evaluation and had complete narrative responses to the accounting internal control self-assessment, which introduces a degree of self-selection bias. Consequently, the results should be interpreted as indicative rather than generalizable to all territorial entities.

Second, although the text-based analysis captures qualitative nuances often lost in structured formats, it may still be influenced by variations in reporting style, verbosity, and institutional culture. While the prompt design sought to standardize evaluation criteria (e.g., methodological rigor, supporting evidence), natural language generation remains sensitive to context, and some noise in the outputs may persist despite calibration.

While the model performed consistently during validation, its performance may vary across institutional contexts not included in the training sample. Moreover, the reliance on narrative data makes the model sensitive to linguistic quality, potentially biasing results against entities with less technical writing capabilities. Future work should explore hybrid approaches combining LLMs with structured fiscal indicators and extend the validation to a broader institutional sample.

Finally, the territorial disaggregation was constrained by the availability of departmental identifiers in the dataset. While aggregate differences are evident, more granular institutional characteristics—such as budget size, staff capacity, or audit history—were not directly included in the analysis.

5.4. Future Research Directions

Future work should expand the dataset to include all territorial entities across the country and develop weighted stratification models that account for institutional complexity, sector, and region. Additionally, integrating geospatial data and fiscal indicators (e.g., execution rates, sanction history) could enrich the predictive power of the risk scoring model.

Additionally, future research should therefore focus on causal evaluation and cross-national benchmarking, going beyond mere descriptive concordance. Embedding the scoring pipeline in longitudinal designs would allow for difference-in-differences or synthetic control strategies to determine if improved scores lead to measurable reductions in fiscal losses. In the same way, efforts should examine algorithmic fairness, such as differential error rates across different types of entities, test interpretable model variants, and make prompt templates and code available for external replication.

Another promising avenue is the development of adaptive prompting strategies that refine model responses based on context-specific training data from particular administrative regions or sectors (e.g., health, education, infrastructure). This could allow the system to dynamically adjust evaluation criteria based on evolving audit patterns or regional policy priorities. Incorporating citizen oversight data and transactional audit trails could strengthen the model’s evidentiary foundation, promoting a multi-source, polycentric approach to combating corruption through fiscal management and artificial intelligence.

Finally, it would be valuable to compare the AI-derived scores with subsequent audit results longitudinally to validate predictive performance and refine risk thresholds over time. Establishing this correlation empirically would strengthen trust in AI-assisted oversight mechanisms and facilitate their integration into the broader accountability ecosystem, in line with international public auditing standards such as INTOSAI’s ISSAIs.

6. Conclusions

This study demonstrates that an AI-assisted scoring pipeline can transform qualitative self-evaluations of internal accounting controls into standardized and comparable indicators. This transformation strengthens fiscal management as a preventive and evidence-based strategy against corruption. By operationalizing a transparent rubric within a large language model (specifically Azure OpenAI GPT) and validating the outputs against expert judgments and institutional benchmarks, it is shown that AI can enhance, rather than replace, professional oversight, all while expanding coverage and improving timeliness.

Empirically, the results reveal persistent disparities: national administrative bodies generally exhibit more robust internal controls compared to many territorial entities, and specific control dimensions (such as inventories and reconciliations) tend to lag behind documentation-related practices. These patterns highlight the need for differentiated capacity-building policies and targeted technical assistance rather than imposing uniform compliance requirements. The composite Internal Control Effectiveness Index (ICEI) further illustrates how AI-derived scores can be integrated with fiscal findings and responsibility rulings to prioritize supervisory resources effectively.

Methodologically, the pipeline strikes a balance between automation and accountability through rule-based checks, expert calibration, and iterative prompt refinement. However, some limitations remain: self-reported narratives may contain reporting bias; the rubric design can reflect normative assumptions; and correlations with fiscal outcomes do not necessarily imply causation. Addressing these constraints will require longitudinal studies, fairness audits, and broader data integration, such as including transactional records and citizen oversight.

From an ethical and governance perspective, the incorporation of AI into fiscal oversight must ensure transparency, explainability, and fairness. The hybrid (AI + human) evaluation model adopted in this study mitigates bias by preserving expert validation at critical checkpoints. These practices align with the principles presented in [48] on responsible AI and strengthen institutional trust in automated decision-support systems.

Future work should explore causal links between improved internal control scores and reductions in fiscal loss, investigate explainable AI variants for greater interpretability, and replicate this approach in other governance contexts. By advancing a replicable, auditable, and scalable framework, this study provides a practical tool for the Comptroller General of the Republic and contributes to the literature on digital governance and anti-corruption strategies.

Author Contributions

Conceptualization, A.E.M., C.M.Z.-P., J.A.R.-C., J.C.C. and E.F.B.; methodology, A.E.M., C.M.Z.-P., J.C.C. and E.F.B.; software, A.E.M., J.C.C. and E.F.B.; validation, A.E.M., C.M.Z.-P., J.C.C., J.A.R.-C., E.F.B., L.A.-P. and R.M.V.; formal analysis, H.F.G. and L.F.; investigation, A.E.M., J.A.R.-C., H.F.G. and L.F.; resources, A.E.M., C.M.Z.-P., L.A.-P., J.A.R.-C. and R.M.V.; data curation, H.F.G. and L.F.; writing—original draft preparation, A.E.M., H.F.G. and L.F.; writing—review and editing, A.E.M., C.M.Z.-P., J.A.R.-C., J.C.C., E.F.B. and L.F.; visualization, H.F.G. and L.F.; supervision, A.E.M., C.M.Z.-P., L.A.-P., J.A.R.-C. and R.M.V.; project administration, J.A.R.-C., R.M.V. and L.F.; funding acquisition, C.M.Z.-P., L.A.-P., J.A.R.-C. and R.M.V. All authors have read and agreed to the published version of the manuscript.

Funding

This work was developed with the funding of the Contraloría General de la República (CGR) and the Universidad de Antioquia (UdeA) under contract CGR-407-2024.

Institutional Review Board Statement

Based on nature of the data used in this paper, such approval was not required. We confirm that all data employed in this research fully comply with the guidelines established in the System for the Governance of Data and Information of the Contraloría General de la República, as defined in the Internal Resolution OGZ-0768-2020.

Informed Consent Statement

Waived as based on the nature of the data used in this paper, such approval was not required.

Data Availability Statement

Due to privacy restrictions related to entities data, the dataset used in this study is not publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Republic of Colombia. Ley 1474 de 2011. Diario Oficial, Bogotá D.C., Colombia, 12 July 2011. Available online: https://www.funcionpublica.gov.co/eva/gestornormativo/norma.php?i=43292 (accessed on 18 June 2025).
Monitor Ciudadano. Portal de Monitoreo de Casos de Corrupción en Colombia, 2025. Available online: https://www.monitorciudadano.co/ (accessed on 18 June 2025).
Ayala García, J.; Bonet-Morón, J.; Pérez-Valbuena, G.J.; Heilbron-Fernández, E.J.; Suret-Leguizamón, J.D. La Corrupción en Colombia: Un Análisis Integral. Documentos de Trabajo sobre Economía Regional y Urbana 2022, 307. Available online: https://repositorio.banrep.gov.co/server/api/core/bitstreams/2c32665f-f630-4f1e-8a27-9bef95ae8d72/content (accessed on 18 June 2025).
World Bank. Enhancing Government Effectiveness and Transparency: The Fight Against Corruption; World Bank: Washington, DC, USA, 2020; Available online: https://documents.worldbank.org/en/publication/documents-reports/documentdetail/435121593132063030 (accessed on 18 June 2025).
United Nations Office on Drugs and Crime (UNODC). The Negative Impact of Corruption on the Enjoyment of Human Rights; UNODC: Vienna, Austria, 2021; Available online: https://www.unodc.org/unodc/en/corruption/publications.html (accessed on 18 June 2025).
Gómez Portilla, K.; Gallón Gómez, S. El Impacto de la Corrupción sobre el Crecimiento Económico Colombiano, 1990–1999. Lect. Econ. 2002, 57, 49–86. Available online: https://ideas.repec.org/a/lde/journl/y2002i57p49-86.html (accessed on 18 June 2025). [CrossRef]
Transparencia por Colombia. Radiografía de los Hechos de Corrupción en Colombia 2021–2022; Transparencia por Colombia: Bogotá, Colombia, 2023; Available online: https://transparenciacolombia.org.co/wp-content/uploads/2024/06/Radiografia-Corrupcion-VF-VF.pdf (accessed on 18 June 2025).
Jimenez-Sanchez, F. Crisis and Corruption in Spain: Improving the Quality of Governance to Fight Corruption. Siyasal J. Political Sci. 2023, 32, 1–14. [Google Scholar] [CrossRef]
Morgan Fullin Saldanha, D.; Nogueira Dias, C.; Guillaumon, S. Transparency and Accountability in Digital Public Services: Learning from the Brazilian Cases. Gov. Inf. Q. 2022, 39, 101680. [Google Scholar] [CrossRef]
Teixeira, M.A.C.; Zuccolotto, R.; Spinelli, M.V.C.; Rodrigues, R.V. Transparency in Latin America: Argentina, Brazil, Colombia, and Mexico. In Handbook of Public Policy in Latin America; Edward Elgar Publishing: Cheltenham, UK, 2025. [Google Scholar] [CrossRef]
Grindle, M. Jobs for the Boys: Patronage and the State in Comparative Perspective; Harvard University Press: Cambridge, MA, USA, 2012. [Google Scholar]
Mungiu-Pippidi, A. The Quest for Good Governance: How Societies Develop Control of Corruption; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
COSO. Internal Control—Integrated Framework; Committee of Sponsoring Organizations of the Treadway Commission: New York, NY, USA, 2013; Available online: https://www.coso.org (accessed on 18 June 2025).
INTOSAI. INTOSAI GOV 9100—Guidelines for Internal Control Standards for the Public Sector; INTOSAI: Vienna, Austria, 2004; Available online: https://www.issai.org (accessed on 18 June 2025).
Darmawati, D.; Jaafar, N.I.; HS, R.; Baja, H.K.; Purisamya, A.J.; Yolanda, A.M.W.; Amir, B.; Juanda, M.R.P. The Role of Artificial Intelligence in Improving the Efficiency and Accuracy of Local Government Financial Reporting: A Systematic Literature Review. J. Risk Financial Manag. 2025, 18, 601. [Google Scholar] [CrossRef]
Cubillos, G.A.R. Impacto del Control Fiscal en Colombia. Rev. Colomb. Contab. 2024, 12, 1. [Google Scholar] [CrossRef]
Contraloría General de la República. Estudio Sobre la Efectividad del Control Fiscal Interno en Colombia: Estudio Intersectorial; CGR: Bogotá, Colombia, 2023. [Google Scholar]
Tanzi, V. Corruption, Arm’s-Length Relationships and Markets. In The Economics of Organised Crime; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Begovic, B. Corruption: Concepts, Types, Causes and Consequences; Center for Liberal-Democratic Studies: Belgrade, Serbia, 2005. [Google Scholar]
Klitgaard, R. Controlling Corruption; University of California Press: Berkeley, CA, USA, 1988. [Google Scholar]
Salamanca, L.J.G.; Salcedo-Albarán, E.; de León-Beltrán, I.; Guerrero, B. Conceptualizing State Capture in Latin America; Poder LatAm: Mexico City, Mexico, 2022. [Google Scholar]
Castellanos, D. Pertinencia de la Implementación de un Sistema de Control Interno y Administración de Riesgos en las Entidades Públicas: Estudio de Caso del Proyecto de Remodelación de la Refinería de Cartagena (Reficar). Control Visible 2024, 3, 26–44. [Google Scholar] [CrossRef]
Buitrago, F.; Ortíz, A. Monitor Ciudadano Report: 2016–2022; Monitor Ciudadano: Bogotá, Colombia, 2024; Available online: https://www.monitorciudadano.co/ (accessed on 18 June 2025).
Departamento Administrativo Nacional de Estadística (DANE). Producto Interno Bruto—PIB 2022 (Cifras Preliminares); DANE: Bogotá, Colombia, 2023. Available online: https://www.dane.gov.co/index.php/estadisticas-por-tema/cuentas-nacionales/cuentas-nacionales-anuales/pib (accessed on 18 June 2025).
Congreso de Colombia. Ley 610 de 2000: Por la Cual se Establece el Trámite de los Procesos de Responsabilidad Fiscal de Competencia de las Contralorías; Congreso de la República: Bogotá, Colombia, 2000. [Google Scholar]
OECD. Budgeting and Public Expenditures in OECD Countries 2019; OECD Publishing: Paris, France, 2019. [Google Scholar] [CrossRef]
EUPAN. Common Assessment Framework (CAF); European Public Administration Network: Brussels, Belgium, 2020; Available online: https://www.eupan.eu (accessed on 18 June 2025).
Christensen, T.; Lægreid, P. Reformas Post Nueva Gestión Pública. Gestión Política Pública 2007, 16, 539–564. Available online: https://www.scielo.org.mx/pdf/gpp/v16n2/1405-1079-gpp-16-02-539.pdf (accessed on 18 June 2025).
EFQM. EFQM Model 2025; EFQM: Brussels, Belgium, 2025. [Google Scholar]
European Commission. Revision of the Internal Control Framework—Communication C(2017) 2373 Final; European Commission: Brussels, Belgium, 2017. [Google Scholar]
HM Treasury. The Orange Book: Management of Risk—Principles and Concepts; Government of the United Kingdom: London, UK, 2023. [Google Scholar]
Departamento Administrativo de la Función Pública. Modelo Integrado de Planeación y Gestión (MIPG); DAFP: Bogotá, Colombia, 2024. Available online: https://www1.funcionpublica.gov.co/web/mipg (accessed on 18 June 2025).
MinCiencias. Modelo Estándar de Control Interno (MECI); Ministerio de Ciencia, Tecnología e Innovación: Bogotá, Colombia, 2024. Available online: https://minciencias.gov.co/quienes_somos/control/control_modelo (accessed on 18 June 2025).
Hernández-Royett, J.; Hernández, Y.F.; Gil, M.D.; Cárdenas Barbosa, E. Evaluación del Modelo Integrado de Planeación y Gestión (MIPG) en las Entidades Territoriales del Estado Colombiano. AGLALA 2018, 9, 444–463. Available online: https://dialnet.unirioja.es/servlet/articulo?codigo=6832778 (accessed on 18 June 2025).
OECD. Auditing and AI: Improving Risk Detection and Control; OECD Publishing: Paris, France, 2022; Available online: https://www.oecd.org/gov/auditing-and-ai.htm (accessed on 18 June 2025).
Vephkhvia Grigalashvili. Artificial Intelligence in Public Administration: An Ethical Dilemma. Int. J. Innov. Technol. Soc. Sci. 2025, 2, 46. [Google Scholar] [CrossRef]
European Commission. Ethics Guidelines for Trustworthy AI; High-Level Expert Group on Artificial Intelligence: Brussels, Belgium, 2021; Available online: https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai (accessed on 18 June 2025).
Papagiannidis, E.; Mikalef, P.; Conboy, K. Responsible Artificial Intelligence Governance: A Review and Research Framework. J. Strateg. Inf. Syst. 2025, 34, 101885. [Google Scholar] [CrossRef]
Kokina, J.; Blanchette, S.; Davenport, T.H.; Pachamanova, D. Challenges and Opportunities for Artificial Intelligence in Auditing: Evidence from the Field. Int. J. Account. Inf. Syst. 2025, 56, 100734. [Google Scholar] [CrossRef]
Alon-Barkat, S.; Busuioc, M. Human–AI Interactions in Public Sector Decision Making: “Automation Bias” and “Selective Adherence” to Algorithmic Advice. J. Public Adm. Res. Theory 2023, 33, 153–176. [Google Scholar] [CrossRef]
Mökander, J. Auditing of AI: Legal, Ethical and Technical Approaches. Digit. Soc. 2023, 9, 49. [Google Scholar] [CrossRef]
Wuttke, A.; Rauchfleisch, A.; Jungherr, A. Artificial Intelligence in Government: Why People Feel They Lose Control. arXiv 2025, arXiv:2505.01085. [Google Scholar]
Falleti, T.G. Decentralization and Subnational Politics in Latin America; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Dickovick, J.T. Decentralization and Recentralization in the Developing World: Comparative Studies from Africa and Latin America; Penn State Press: University Park, PA, USA, 2011. [Google Scholar]
Pinilla-Rodríguez, D.E.; Hernández-Medina, P. Governance and Fiscal Decentralisation in Latin America: An Empirical Approach. Economies 2024, 12, 207. [Google Scholar] [CrossRef]
Dávid-Barrett, E. State capture and development: A conceptual framework. J. Int. Relat. Dev. 2023, 26, 224–244. [Google Scholar] [CrossRef]
OECD. Integrity for Good Governance in Latin America and the Caribbean; OECD Publishing: Paris, France, 2018. [Google Scholar]
OECD. Advancing Accountability in AI; OECD Publishing: Paris, France, 2023; Available online: https://www.oecd.org/content/dam/oecd/en/publications/reports/2023/02/advancing-accountability-in-ai_753bf8c8/2448f04b-en.pdf (accessed on 18 June 2025).

Figure 1. AI-assisted fiscal risk scoring pipeline. The workflow comprises five sequential stages: (1) Input data collection—ingestion of entities’ narrative responses in Excel format; (2) Prompt construction—assembly of a standardized template combining the audit question, scoring rubric, and response text; (3) GPT evaluation—model inference via Azure OpenAI to generate a provisional score; (4) Scoring output—export of numeric risk scores (1–10) for each observation; and (5) Validation and quality assurance—automated checks and expert review to detect anomalies and calibrate the model.

Figure 2. Boxplot of three-year average AI-derived internal control scores by level of government (National vs. Territorial). Boxes show the interquartile range (IQR), horizontal lines denote medians, whiskers extend to 1.5 × IQR, and points beyond are outliers. Higher values indicate stronger internal accounting control.

Figure 3. Heatmap of AI-derived question scores (2020–2022). Rows (Q1–Q9) correspond to the self-evaluation items, columns to fiscal years (2020–2022). Cell colors encode the score magnitude (1–10), with higher values indicating stronger internal accounting control. Numeric labels inside each cell show the exact score to aid comparison across questions and years.

Figure 4. Distribution of AI-derived internal control scores across 219 public entities (2020–2022). National entities are represented in blue, while territorial entities are shown in orange. Dashed vertical lines indicate the average scores for each group (National = 8.9; Territorial = 6.9). The distribution is right-skewed, suggesting that most entities have medium to high control strength. However, there is significant concentration at the lower end, with approximately 37% of territorial entities scoring below 7.0, highlighting persistent weaknesses in the internal control systems at the subnational level.

Table 1. Comparative Overview of National and International Public Management Models.

Model	Origin	Focus	Key Principles	Structure
MIPG	Colombia	Integrated planning and management	Public value, accountability, interinstitutional articulation	7 interdependent dimensions
MECI	Colombia	Internal control and risk management	Self-control, self-regulation, self-management	5 components + 3 lines of defense
COSO	USA	Internal control and corporate governance	Control environment, risk assessment, monitoring	5 components + sustainability controls
CAF	European Union	Public sector quality management	Leadership, partnerships, people, continuous improvement	Based on European excellence model
INTOSAI	International	Audit standards and internal control	Ethics, efficiency, protection of public resources	Adapted COSO framework
Orange Book	United Kingdom	Public sector risk management	Evidence-based governance, transparency, flexibility	Principle-based + risk control framework

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Fiscal Management and Artificial Intelligence as Strategies to Combat Corruption in Colombia

Abstract

1. Introduction

2. Theoretical Framework

2.1. Defining Corruption

2.2. Typologies and Theoretical Approaches

2.3. Empirical Evidence in Colombia

2.4. Fiscal Management as a Governance Instrument

2.5. International Frameworks in Public Management and Internal Control

2.5.1. The Common Assessment Framework (CAF)

2.5.2. The Whole-of-Government Approach (WOG)

2.5.3. The EFQM Excellence Model

2.5.4. COSO—Internal Control–Integrated Framework

2.5.5. INTOSAI Guidelines for Internal Control

2.5.6. European Commission Internal Control Framework

2.5.7. The Orange Book (UK)

2.6. Integrated Public Management and Internal Control Models in Colombia

2.7. AI-Powered Fiscal Control

2.8. AI Governance and Algorithmic Ethics

2.9. AI Adoption Challenges in Public-Sector Auditing

3. Methodological Approach

3.1. Study Design and Rationale

3.2. Dataset Description

3.3. Population, Sample, and Data Sources

3.4. Data Ingestion and Pre-Processing

3.5. AI-Assisted Scoring System

Prompt Engineering and Scoring Rubric

3.6. Evaluation Metrics

3.7. Ethical, Legal, and Governance Considerations

4. Results

4.1. Corpus and Score Distributions

4.2. Human–AI Agreement and Calibration

4.3. Sectoral Patterns and Recurring Risk Themes

4.4. Anomalies, Composite Index, and Robustness Checks

5. Discussion

5.1. Territorial Disparities and Subnational Administrative Capacity

5.2. Linking AI-Assisted Oversight to Corruption Mitigation

5.3. Limitations

5.4. Future Research Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics