1. Introduction
Corruption is not a new phenomenon. Ancient Egypt faced a wide range of corruption forms from offering and accepting bribes, to embezzlement, stealing or misusing money [
1]. Corruption was also a prevalent crime in the Roman Empire, known as ambitus, according to ancient Roman law. More specifically, ambitus was a crime of political corruption, mainly a candidate’s attempt to influence the outcome (or direction) of an election through bribery or other forms of soft power [
2]. On the other hand, anti-corruption efforts are not new either. In ancient Greece, the various and multiple anti-corruption measures of Athens sought to bring ‘hidden’ knowledge into the open and thereby remove information from the realm of individual judgment, placing it instead into the realm of collective judgment. The Athenian experience suggests that participatory democracy and a civic culture that fosters political equality rather than reliance on individual expertise provide a key bulwark against corruption [
3].
In the modern world, corruption is still present and has evolved in terms of its forms, but without altering its purpose. So, according to Transparency International, corruption can still be defined as the abuse of entrusted power for private gain [
4]. Thus, corruption poses one of the biggest obstacles to social justice, institutional trust, and worldwide economic stability. The amount of documentation, from company and financial records to procurement contracts, has increased dramatically as the public and private sectors move toward digital governance [
5]. Signals of misconduct are frequently hidden in unstructured text in the ever-evolving digital and big data landscape, making manual oversight not only time-consuming but also inadequate [
6]. The efficient tackling of corruption nowadays relies heavily on the ability to automatically detect corruption risk indicators.
Until recently, corruption detection relied heavily on structured data analysis, such as identifying irregularities in financial transactions or flagging statistical anomalies in procurement records [
7,
8]. While these structured approaches provided a necessary foundation for oversight, they remained fundamentally limited to numerical discrepancies, leaving the vast, qualitative landscape of unstructured linguistic data available unmonitored. Current research is shifting towards Natural Language Processing (NLP) and Large Language Models (LLMs) to address the complexities of fraud concealed within unstructured text, ranging from email correspondence to intricate contract narratives [
6,
9]. Despite being highly interpretable and computationally efficient, rule-based systems and keyword matching were the mainstays of early textual methodologies. However, they often struggled with the linguistic ambiguity that characterizes corrupt exchanges [
10]. The “coded” language that malicious actors frequently use to conceal their illegal intent was not captured by these static models, which treated language as a bag of isolated terms. Therefore, when confronted with sophisticated evasion strategies that bypassed explicit risk terminology, conventional techniques frequently generated high false-negative rates [
11].
The emergence of transformer-based architectures has revolutionized this landscape by moving beyond simple lexical analysis to capture deep semantic meaning. Transformer architectures [
12] are widely used in NLP processes, outperforming other neural models (e.g., Recurrent Neural Networks and Convolutional Neural Networks in terms of natural language generation or understanding). Model pretraining on generic large corpora is an important advantage of transformer-based models, which leads to increased efficiency in downstream tasks such as machine translation, summarization, language understanding and classification [
13]. Bidirectional Encoder Representation from Transformers (BERT) [
14], which follows an advanced trained deep learning approach, has demonstrated an impressive capability in text detection, mining, processing, and analysis tasks, outperforming conventional methods in diverse scenarios [
15].
Despite the growing body of research on corruption detection, existing approaches predominantly rely on structured data analysis or rule-based textual methods, with limited attention given to the systematic use of advanced transformer-based architectures for analyzing unstructured procurement documentation. Particularly, the comparative effectiveness of recent transformer-based models, as well as the role of explainability in supporting transparent and interpretable corruption risk detection, remains underexplored. The current study is structured to bridge the research gap with regard to the automated semantic analysis of unstructured procurement documentation, where misleading language is used to hide fraudulent behavior. Specifically, this study investigates and evaluates the effectiveness of transformer-based architectures in detecting corruption risk indicators within complex technical specifications. It further examines how successive generations of models, specifically BERT, RoBERTa and DeBERTa-v3, compare across dimensions of predictive accuracy and operational efficiency. Additionally, this research investigates the extent to which explainability mechanisms, such as Integrated Gradients, provide the transparency and traceability necessary for human-in-the-loop oversight. By leveraging an outcome-driven labeling strategy grounded in Open Contracting Data Standard (OCDS) metrics, this research moves beyond subjective annotation to establish a reproducible, evidence-based detection framework that meets the practical requirements of large-scale public transparency initiatives and real-world operational scenarios.
The remainder of the paper is organized as follows:
Section 2 presents related works, focusing on the domain of corruption detection and risk assessment using state-of-the-art technologies and Artificial Intelligence.
Section 3 describes in detail the proposed methodology designed and developed, whilst
Section 4 elaborates on the produced results. Finally,
Section 5 concludes the paper.
2. Related Works
Modern corruption is so complex and dynamic that it requires the use of cutting-edge scientific approaches, as well as state-of-the-art computational tools, to model and detect activities that indicate corruption [
16]. BERT models are increasingly used in contemporary research studies to tackle corruption cases. Damiano et al. [
17] analyzed the annual reports of banks utilizing textual analysis methods for extracting indicators of potential corruption cases. More specifically, the authors combined sentiment analysis following a dictionary approach and a BERT model called FinBERT for the categorization of Environmental, Social, and Governance (ESG) sentences. Soon after that, they utilized Random Forest (RF), Support Vector Machines (SVMs), Naïve Bayes and Gradient boosting algorithms for the classification of potential corruption events. The authors concluded that specific textual measures could make an important contribution to the detection of corruption events.
Algorithmic Trading Systems allow trade execution to be implemented automatically, rapidly and efficiently. However, their complex structure often renders them susceptible to being utilized in financial corruption and fraud cases. Mohamed et al. [
18] proposed a framework which combined different BERT variants for the semantic interpretation of financial logs with transformer models for modeling market behavior. The so-called TADST framework offered real-time detection of potentially fraudulent activities. Experimental testing of the framework proved its effectiveness, as it helped in improving existing benchmarks (98.7% improvement in efficiency and 97.4% in accuracy). In another research work for tackling financial fraud, Ergun and Sefer [
19] proposed the so-called DeepFraud framework, which combined different Large Language embeddings (e.g., FinBERT, Fin GPT, and FinLlama) and Long Short-Term Memory (LSTM), for detecting corruption incidents related to financial fraud. Experimental testing of the framework in financial records of a 30-year period (1995 to 2024) indicated a precision score of 86% and an F1-score of 84%, outperforming, in many scenarios, other contemporary models (e.g., SVM, XGBoost, Logistic Regression, and Autoformer).
A methodology combining BERT and NLP techniques was proposed by Lima et al. [
20] for the detection of corruption indicators in public procurement texts describing the rules for hiring. More specifically, the methodology extracts red flags denoting potential fraud cases. Experimental testing of the proposed methodology indicated an 88.8% recall rate, which outperformed other contemporary models (i.e., Bottleneck and BiLSTM). Torres-Berru et al. [
21] presented an NLP-based approach for the detection of gender bias and favoritism in public procurements. More specifically, the authors made use of a Word2-vec model, as well as a sentiment analysis algorithm, for analyzing the questions and answers registry platform for public procurement processes in Ecuador. Experimental testing of the methodology in a corpus of 303,076 procurement processes indicated high accuracy rates, i.e., 88% for favoritism detection and 90% for gender bias detection.
Combating corruption in public procurement from several different aspects is the focus of many scientific papers. Salazar et al. [
22] developed a tool for detecting public procurement corruption cases and for prioritizing resources. Their tool took into consideration both deliberate corrupt actions taken by decision-makers and inefficiencies which may support corrupt cases. It also detected red flags, which were highly probable to be connected with corrupt deeds. For the classification tasks, Logistic regression and RF methods were used. Experimental testing indicated improvements in corruption detection as compared to other contemporary methods, achieving accuracy rates of up to 88.29%. On the other hand, Munoz-Cancino and Rios [
23] presented a methodology for detecting corruption in government tenders based on Social Network Analysis and an Isolation Forest algorithm. The authors stressed the importance of specific network structural settings for the early detection of corruption. During the experimental testing of the methodology, supplier centrality, density and the number of connections of related entities, as well as supplier financial characteristics, were found to have a key importance in the detection of previously unknown anomaly patterns. Other complex characteristics were also highlighted by Pernica et al. [
24] as important in the detection of corruption in military equipment procurement. More specifically, variables related to national culture and a government’s ability to combat corruption were indicated as important in detecting suspicious cases. The authors conducted a comparative case study spanning 16 years (from 2008 and 2023) across four countries (i.e., Norway, Lithuania, Slovakia, and Czechia) related to mass-produced military equipment procurement.
The importance of indicators related to the relationships between buyers and suppliers in identifying corruption in procurement contracts was stressed by Aldana et al. [
25]. The authors concluded that such indicators were more important than those related to the characteristics of individual contracts. An ensemble model of RF classifiers was also proposed, which achieved an accuracy rate of up to 92% during its experimental testing. Ayobami et al. [
26] utilized several parameters (e.g., contract values, timelines, and bidder characteristics) for detecting corruption indicators, bid-rigging cases, and conflicts of interest. The proposed framework was experimentally tested, yielding an accuracy rate of over 87% in the detection of suspicious transactions.
The data included in audit reports and governmental budgets can be used for the detection of corruption incidents. Based on NLP methods, Beltran [
27] proposed a pipeline for detecting indicators of potential corruption cases in audit reports related to governmental budgets. The author utilized publicly available data from Supreme Audit Institutions (SAIs) for this pipeline. Firstly, a classification algorithm was used for determining which parts of the input texts were relevant. Soon after that, a Named Entity Recognition (NER) model was developed for extracting monetary values of budget discrepancies. The author also highlighted that although a discrepancy itself was not necessarily denoting a corruption incident, the proposed model could be a useful tool for fighting corruption and forming anti-corruption policies. In another research work focusing on audit and budget data, Ash et al. [
28] proposed a Gradient Boosting model for detecting corruption cases. This tree-based model calculated a measure indicating the possibility of corruption issues, which could be used for empirical analysis and for supporting anti-corruption policy-making. Experimental testing of the proposed model indicated that it could be helpful in conducting targeted rather than random audits, yielding 83.6% more corruption cases as compared to random audits.
Social media posts are used in many research works related to tackling corruption. Xiao [
29] proposed a deep learning methodology for detecting corruption incidents in texts from social media. The methodology encompassed preprocessing, feature extraction and selection and corruption detection based on a Convolutional Neural Network (CNN). In an experimental testing of the model using a dataset including 19,560 tweets, the model yielded an accuracy of about 90% in the detection of corruption incidents of different kinds (e.g., money laundering, bribery, and nepotism). On the other hand, Indriyanti et al. [
30] proposed a methodology for the analysis of the public perception of corruption incidents based on posts from the X social media. In this context, BERT-based sentiment analysis was used together with Latent Dirichlet Allocation (LDA) for the identification of corruption-related dominant topics in the East Java province of Indonesia. The authors also highlighted the important role such a methodology could play in forming concrete policy recommendations based on social media data, as well as in strengthening accountability. Experimental testing of the methodology also indicated its high accuracy as regards the sentiment categorization in posts, which reached 98.51% in accuracy.
Transform-based architectures and, more specifically, LLMs offer enhanced capabilities to the ever-growing digital landscape, which abounds in digital forensics. In this light, LLMs provide a very strong analysis tool for unstructured data with supreme semantic analysis capabilities. The authors of [
31] concentrated on the promising potential of incorporating LLMs into digital forensics in order to improve the effectiveness of investigations and deal with the massive amounts of data that are met in modern cybercrimes, including corruption. Their study indicated how LLMs could greatly speed up conventional forensic processes by automating the extraction, classification, and summarization of vast quantities of unstructured digital evidence, including emails, chat logs, and text files. More specifically, the experimental findings showed that LLMs could perform promisingly on age and gender prediction tasks while retaining computational efficiency, especially the Polyglot model with LoRA and QLoRA fine-tuning, achieving accuracy and F1-score for both categories over 70%. Li et al. [
32] aimed at presenting how sophisticated semantic parsing could successfully reveal hidden behavioral patterns and unusual trajectories by using LLM-based architectures to analyze intricate geospatial movements and unstructured spatial narratives. Because of its reference architecture, it could be directly used to map and identify illicit trade networks that are relevant to smuggling operations and cross-border corruption.
5. Conclusions
This study presented a comparative evaluation of three transformer-based NLP architectures (i.e., BERT, RoBERTa, and DeBERTa-v3) for the detection of corruption risk indicators in procurement texts from heterogeneous sources. By combining textual analysis with structured outcome-based risk indicators, the current study demonstrated that contextual language models can effectively identify linguistic patterns associated with marked corruption risk. Moreover, the comparative evaluation analysis of BERT, RoBERTa and DeBERTa-v3 confirmed a consistent performance progression across the different transformer generations, with DeBERTa-v3 achieving the strongest overall results, particularly in terms of precision, recall and F1-score, validating its superior predictive performance and contextual understanding in detecting corruption indicators. These findings highlight the importance of advanced contextual modeling when analyzing complex procurement language that is often found in restrictive technical specifications.
Beyond predictive accuracy, this study emphasized explainability. Specifically, attribution-based explainability using Integrated Gradients allowed the identification of influential textual features contributing to risk predictions, supporting transparency, traceability and manual validation. The analysis results demonstrated that more advanced transformer architectures produced more coherent attribution patterns, reinforcing their suitability for operational environments, where interpretability and accountability are essential. From an applied perspective, the integration of automated risk scoring with attribution-driven explanations provides users with actionable insights that can support and enhance decision-making, while it can also reduce manual analysis effort.
The findings of this study provide substantial theoretical insights and practical policy recommendations for the modernization of public procurement oversight. Because of complicated processes, vast amounts of financial transactions, and sometimes subjective decision-making processes, public procurement is a breeding ground for corruption. As authorities move to digital e-procurement, they have to deal with an overwhelming amount of data that cannot be examined manually. This study shows that using AI and machine learning tools in e-procurement can make oversight controls much more efficient, open, and effective. Transformer-based NLP models help auditors and law enforcement agencies figure out corruption risks and find illegal ways by automating the analysis of vast datasets.
From a policy point of view, it is crucial that the use of AI-assisted automated systems for decision support follows strict ethical rules and frameworks, like the EU AI Act, to retain accountability of any decision. Adding explainability features to AI models directly fulfills this policy requirement by making it clear which parts of the text set off risk signals. This ensures that algorithmic decisions are still legal, clear, and open to human review. In the end, making these AI-powered monitoring systems standard across the board can cut down on discretionary decision-making, make institutions more open, and encourage collaboration between institutions to find corruption patterns more quickly across different public bodies.
Although the proposed method demonstrates strong performance and practical applicability, several directions for future research remain open. Data generalization, whereby expanding the availability and diversity of procurement documentation across jurisdictions, represents a key step toward improving model robustness and generalization. Future work may also explore the integration of additional contextual features, including supplier networks, financial patterns or even graph-based relationships, to enrich corruption risk assessment beyond textual analysis. Moreover, systematic evaluation of explainability consistency and the incorporation of human-in-the-loop feedback mechanisms could enhance both model reliability and operational trust of the extracted results.
In addition, recent advancements in LLMs open promising opportunities for extending the proposed framework. While this study focused on transformer-based classification architectures to ensure stability and reproducibility, future applications may explore and leverage comparisons with zero-shot or few-shot LLMs (e.g., GPT-based approaches) for tasks such as advanced contextual reasoning, automated explanation generation, semantic summarization of procurement documents or interactive investigator assistance, while further contextualization of the performance of fine-tuned transformer models can succeed. Furthermore, hybrid investigating approaches could also combine robust transformer-based risk prediction with LLM-driven analytical support, which could be considered as a powerful next step toward intelligent and collaborative anti-corruption monitoring systems.
Author Contributions
Conceptualization, N.P., T.A., E.D. and E.A.; methodology, N.P., T.A., E.D. and E.A.; software, N.P.; validation, N.P., E.A. and E.D.; formal analysis, N.P., T.A., E.D. and E.A.; investigation, N.P., T.A. and E.D.; resources, N.P., E.A., E.D. and T.A.; data curation, N.P. and T.A. writing—original draft preparation, N.P., T.A., E.D. and E.A.; writing—review and editing, E.A., E.D. and T.A.; visualization, N.P. and T.A.; supervision, E.A.; project administration, E.A. All authors have read and agreed to the published version of the manuscript.
Funding
Co-funded by the European Union within the Horizon Europe Program, under grant agreement No. 101121281 (Project FALCON). Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| AI | Artificial Intelligence |
| APIs | Application Programming Interfaces |
| BERT | Bidirectional Encoder Representation from Transformers |
| BPE | Byte-Pair Encoding |
| CNN | Convolutional Neural Network |
| ESG | Environmental, Social, and Governance |
| GPU | Graphics Processing Unit |
| LDA | Latent Dirichlet Allocation |
| LLMs | Large Language Models |
| LSTM | Long Short-Term Memory |
| NER | Named Entity Recognition |
| NLP | Natural Language Processing |
| OCDS | Open Contracting Data Standard |
| PR | Precision–Recall |
| RF | Random Forest |
| SAIs | Supreme Audit Institutions |
| SHAP | SHapley Additive exPlanations |
| SVMs | Support Vector Machines |
| XAI | eXplainable AI |
References
- Eyre, C. Patronage, Power, and Corruption in Pharaonic Egypt. Int. J. Public Adm. 2011, 34, 701–711. [Google Scholar] [CrossRef]
- Lintott, A. Electoral Bribery in the Roman Republic. J. Rom. Stud. 1990, 80, 1–16. [Google Scholar] [CrossRef]
- Taylor, C. Corruption and Anticorruption in Democratic Athens. In Anti-Corruption in History: From Antiquity to the Modern Era; Oxford University Press: Oxford, UK, 2017; ISBN 978-0-19-880997-5. [Google Scholar]
- Transparency International What Is Corruption? Available online: https://www.transparency.org/en/what-is-corruption (accessed on 13 February 2026).
- Petheram, A.; Pasquarelli, W.; Stirling, R. The Next Generation of Anti-Corruption Tools: Big Data, Open Data & Artificial Intelligence. 2019. Available online: https://ec.europa.eu/futurium/en/system/files/ged/researchreport2019_thenextgenerationofanti-corruptiontools_bigdataopendataartificialintelligence.pdf (accessed on 28 February 2024).
- Parvanova, I. The Use of Big Data by Anticorruption Authorities; CHR, U4 Anti-Corruption Resource Centre, Michelsen Institute: Bergen, Norway, 2025. [Google Scholar]
- Mironov, M.; Zhuravskaya, E. Corruption in Procurement and the Political Cycle in Tunneling: Evidence from Financial Transactions Data. Am. Econ. J. Econ. Policy 2016, 8, 287–321. [Google Scholar] [CrossRef]
- Fazekas, M.; Kocsis, G. Uncovering High-Level Corruption: Cross-National Objective Corruption Risk Indicators Using Public Procurement Data. Br. J. Political Sci. 2017, 50, 155–164. [Google Scholar] [CrossRef]
- Bauer, M.; Zirker, A. Strategies of Ambiguity; Routledge: Abingdon, UK, 2024. [Google Scholar]
- OECD. Governing with Artificial Intelligence: The State of Play and Way Forward in Core Government Functions; OECD Publishing: Paris, France, 2025. [Google Scholar]
- Hajek, P.; Henriques, R. Mining Corporate Annual Reports for Intelligent Detection of Financial Statement Fraud—A Comparative Study of Machine Learning Methods. Knowl.-Based Syst. 2017, 128, 139–152. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; Liu, Q., Schlangen, D., Eds.; Association for Computational Linguistics: Cedarville, OH, USA, 2020; pp. 38–45. [Google Scholar]
- BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Available online: https://arxiv.org/abs/1810.04805 (accessed on 26 February 2026).
- Aftan, S.; Shah, H. A Survey on BERT and Its Applications. In Proceedings of the 2023 20th Learning and Technology Conference (L&T), Jeddah, Saudi Arabia, 26 January 2023; pp. 161–166. [Google Scholar]
- Joly, M. Corruption: The Shortcut to Disaster. Sustain. Prod. Consum. 2017, 10, 133–156. [Google Scholar] [CrossRef]
- Damiano, R.; Polizzi, S.; Scannella, E.; Valenza, G. Corruption Detection Through Textual Analysis: Evidence from Eurozone Banks. Bus. Ethics Environ. Responsib. 2026, 35, 1017–1037. [Google Scholar] [CrossRef]
- Mohamed, A.N.; Manaa, M.E.; Soni, S.; Kizi, S.S.K.; Doss, D. Financial Fraud Detection in Algorithmic Trading Systems Using BERT Variants and Time-Series Embedding. In Proceedings of the 2025 3rd International Conference on Cyber Resilience (ICCR), Dubai, United Arab Emirates, 3–4 July 2025; pp. 1–6. [Google Scholar]
- Erva Ergun, Z.; Sefer, E. Financial Statement Fraud Detection via Large Language Models. Intell. Syst. Account. Financ. Manag. 2025, 32, e70021. [Google Scholar] [CrossRef]
- Lima, W.; Lira, R.; Paiva, A.; Silva, J.; Silva, V. Methodology for Automatic Extraction of Red Flags in Public Procurement. In Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18–23 June 2023; pp. 1–7. [Google Scholar]
- Torres Berrú, Y.; Batista, V.; Conde, L. A Data Mining Approach to Detecting Bias and Favoritism in Public Procurement. Intell. Autom. Soft Comput. 2023, 36, 3501–3516. [Google Scholar] [CrossRef]
- Salazar, A.; Pérez, J.F.; Gallego, J. VigIA: Prioritizing Public Procurement Oversight with Machine Learning Models and Risk Indices. Data Policy 2024, 6, e75. [Google Scholar] [CrossRef]
- Muñoz-Cancino, R.; Ríos, S.A. Data-Driven Transparency: Machine Learning and Social Network Analysis for Corruption Detection in Public Procurement. Procedia Comput. Sci. 2025, 270, 1788–1795. [Google Scholar] [CrossRef]
- Pernica, B.; Palavenis, D.; Dvorak, J. Small Arms Procurement and Corruption in Small NATO Countries. J. Public Procure. 2024, 24, 348–370. [Google Scholar] [CrossRef]
- Aldana, A.; Falcón-Cortés, A.; Larralde, H. A Machine Learning Model to Identify Corruption in México’s Public Procurement Contracts. arXiv 2022. [Google Scholar] [CrossRef]
- Ayobami, A.T.; Mike-Olisa, U.; Chidera Ogeawuchi, J.; Abayomi, A.A.; Agboola, O.A. Algorithmic Integrity: A Predictive Framework for Combating Corruption in Public Procurement through AI and Data Analytics. J. Front. Multidiscip. Res. 2023, 4, 130–141. [Google Scholar] [CrossRef]
- Beltran, A. Fiscal Data in Text: Information Extraction from Audit Reports Using Natural Language Processing. Data Policy 2023, 5, e7. [Google Scholar] [CrossRef]
- Ash, E.; Galletta, S.; Giommoni, T. A Machine Learning Approach to Analyze and Support Anti-Corruption Policy. SSRN J. 2021, 17, 162–193. [Google Scholar] [CrossRef]
- Xiao, Q. Automated Detection of Corruption Reports in Text via Deep Reinforcement Learning. Sci. Rep. 2025, 15, 36674. [Google Scholar] [CrossRef] [PubMed]
- Indriyanti, A.D.; Gernowo, R.; Sediyono, E. Machine Learning Approach for Sentiment and Topic Analysis on Social Media X: Case Study of Corruption Handling by the East Java Government. In Proceedings of the 2025 Eight International Conference on Vocational Education and Electrical Engineering (ICVEE), Surabaya, Indonesia, 24–25 September 2025; pp. 239–245. [Google Scholar]
- Cho, S.-H.; Kim, D.; Kwon, H.-C.; Kim, M. Exploring the Potential of Large Language Models for Author Profiling Tasks in Digital Text Forensics. Forensic Sci. Int. Digit. Investig. 2024, 50, 301814. [Google Scholar] [CrossRef]
- Li, M.; Zhang, Y.; Zou, W.; Chen, H.; Yang, X.; Chen, T. Geographical Network Analysis of Drug Trafficking in China (2012–2024): A Method Based on Large Language Models. J. Saf. Sci. Resil. 2025, 100273. [CrossRef]
- Anwar, M. mBERT: Multilingual BERT. Available online: https://anwarvic.github.io/cross-lingual-lm/mBERT (accessed on 18 March 2026).
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised Cross-Lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Association for Computational Linguistics: Red Hook, NY, USA, 2020; pp. 8440–8451. [Google Scholar]
- Kenny, C.; Musatova, M. ‘Red Flags of Corruption’ in World Bank Projects: An Analysis of Infrastructure Contracts. In International Handbook on the Economics of Corruption; Elgar Publishing: Camberley, UK, 2010; Volume Two, p. 499. [Google Scholar]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
- Smith, M.; Ruxton, G. Effective Use of the McNemar Test. Behav. Ecol. Sociobiol. 2020, 74, 133. [Google Scholar] [CrossRef]
- Seabold, S.; Perktold, J. Statsmodels: Econometric and Statistical Modeling with Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |