AI-Assisted OSINT/SOCMINT for Safeguarding Borders: A Systematic Review
Abstract
1. Introduction
2. Methodology
2.1. Defining the Research Questions (RQs)
- •
- RQ1 (Effectiveness and Application of AI): How are specific AI technologies (e.g., natural language processing, computer vision, machine learning, generative AI, agentic AI) being applied to leverage open-source and social media data toward enhancing the effectiveness and efficiency of border protection, and what tangible improvements and operational gains have been observed or are theoretically possible compared to traditional methods and other areas of the wider security domain?
- •
- RQ2 (Limitations): What are the principal technical, operational, and data quality limitations (including issues of misinformation, data veracity, bias, and scalability) encountered in deploying AI with open-source and social media data for border protection, and how do these limitations impact the reliability and practical utility of AI-driven insights?
- •
- RQ3 (Ethical, Legal, and Societal Implications): What are the critical ethical, privacy, and legal implications (e.g., surveillance overreach, human rights infringements, data sovereignty, lack of transparency, and potential for discrimination) associated with the use of AI in analyzing open-source and social media for border protection, and how do different stakeholders perceive and address these concerns?
2.2. Developing the Search Strategy
- •
- Search Academic Databases: Utilize selected keywords to search academic databases such as IEEE, Scopus, ACM, Springer Link, MDPI and an academic search engine (Google Scholar). Search Strings/Boolean queries were adapted for each database:
- IEEE Xplore/SpringerLink/MDPI:(“artificial intelligence” OR “machine learning” OR “deep learning” OR “generative AI” OR “large language model” OR “agentic system”) AND (“open-source intelligence” OR OSINT OR SOCMINT OR “social media intelligence” OR “social media” OR “dark web”) AND (“border protection” OR “border security” OR “irregular migration” OR “smuggling” OR “trafficking” OR “terrorism” OR “cybercrime” OR “customs” OR “illegal migration”)
- 2.
- Scopus:TITLE-ABS-KEY ((“artificial intelligence” OR “machine learning” OR “deep learning” OR “generative AI” OR “large language model” OR “agentic AI”)) AND TITLE-ABS-KEY ((“open-source intelligence” OR OSINT OR SOCMINT OR “social media intelligence” OR “social media” OR “dark web”)) AND TITLE-ABS-KEY ((“border protection” OR “border security” OR “irregular migration” OR “smuggling” OR “trafficking” OR “terrorism” OR “cybercrime” OR “customs” OR “illegal migration”)) AND PUBYEAR > 2019 AND PUBYEAR < 2026
- 3.
- ACM Digital Library:[[All: “artificial intelligence”] OR [All: “machine learning”] OR [All: “deep learning”] OR [All: “generative AI”] OR [All: “large language model”] OR [All: “agentic system”]] AND [[All: “open-source intelligence” OR OSINT OR SOCMINT OR “social media intelligence” OR “social media” OR “dark web”]] AND [[All: “border protection” OR “border security” OR “irregular migration” OR “smuggling” OR “trafficking” OR “terrorism” OR “cybercrime” OR “customs” OR “illegal migration”]] AND (publication date: [2019 TO 2025])
- 4.
- Google Scholar (through the “Publish or Perish” automated extraction software):(“Artificial Intelligence” OR AI OR “Machine Learning” OR “Deep Learning” OR “Natural Language Processing” OR “Computer Vision” OR “Generative AI” OR “Large Language Models” OR LLM OR “Agentic AI”) AND (“Open Source Intelligence” OR OSINT OR “Publicly available information” OR “internet” OR “web search” OR “Social Media Intelligence” OR “Social Media”) AND (“Border management” OR “Border Protection” OR “Border Security”).
- •
- Conventional Search engines: Selected keywords were utilized to research conventional search engines such as Google, Bing, and Yandex, to identify relevant material from organizations and agencies (governmental or intergovernmental) involved in border security, like the European Border and Coast Guard Agency (Frontex) and the US CBP/ICE.
2.3. Identifying and Screening Papers
- •
- Initial Evaluation: Screen the title, keywords, and abstract of identified articles to exclude non-relevant studies.
- •
- Apply eligibility criteria
- 1.
- Inclusion Criteria:
- ∘
- Study designs: Empirical studies, case studies, technical reports, government documents, and white papers.
- ∘
- Time period: Emphasis on the last 2019–2025 period, with the exception of landmark studies from authoritative sources outside this period, that may serve as a comparison point, to comprehend the advancements in the given subject.
- ∘
- Geographic scope: Global, with emphasis on high-implementation regions (e.g., EU external borders, US borders, Southeast Asia, etc.)
- ∘
- Language: English and other major languages with translation.
- ∘
- AI applications: AI technologies and AI-based methodologies that process internet-found, publicly available, and social media data in contexts relevant to border protection, where outputs can be operationalized by border agencies as actionable OSINT/SOCMINT. Include studies even if they do not use the terms “OSINT” or “SOCMINT” explicitly, provided they describe open-source or social media data collection, analysis, fusion, or decision-support techniques applicable to border security tasks (e.g., detection, monitoring, forecasting, attribution, or resource allocation). Include AI-driven methodologies that a) augment cyber threat intelligence, crucial for the digital infrastructure security of border enforcement agencies [17,18], and b) support broader security and intelligence tasks, as the concept of “intelligence led’ or “intelligence-informed” border protection is closely interwoven with the organizational fabric of relevant agencies [17]. Emphasize methods with clear pipelines and potential for operational deployment, evaluation metrics, or integration with agency workflows.
- 2.
- Exclusion Criteria:
- ∘
- Opinion pieces without supportive data
- ∘
- Studies focusing solely on technical specifications without implementation data
- ∘
- Studies outside the focus chronological period (2019–2025), without any AI-assisted technological automation, and not bearing the characteristics of a landmark study.
2.4. Selection Process
2.5. Data Collection Process
- •
- Bibliographic details (Key [Zotero ID], authors, publication year, title, publication title [journal/venue], DOI/URL [95%+ coverage], abstract note, pages, volume, issue, ISSN/ISBN, language [primarily English], item type [e.g., journal article, conference paper]).
- •
- Study design (empirical, case study, experimental, systematic review, white paper, policy report; additional provenance like access date, date added/modified, library catalog).
- •
- AI technique applied (e.g., NLP, ML, CV, LLMs, generative AI, agentic AI; tagged via Manual Tags and Automatic Tags for categorization).
- •
- Application context (border protection, trafficking, migration prediction, cybercrime; including Extra field notes on geographic/operational scope, e.g., Eastern Europe borders).
- •
- Validation status (real-world-tested, simulation, theoretical; with transferability assessment [whether findings can be generalized to border contexts]).
- •
- Metrics reported (precision, recall, accuracy, F1 score, etc.; extracted quantitatively where available, e.g., +86% improvement over baselines).
- •
- Narrative summaries in Notes field, mapping RQs coverage.
2.6. Risk of Bias Assessment
2.7. Data Synthesis
- •
- Effect Measures: Due to the heterogeneity of study designs and metrics, quantitative meta-analysis was reckoned as not feasible. Instead, qualitative synthesis and evidence tabulation were employed.
- •
- Synthesis Methods: Evidence matrices were developed for each RQ.
- •
- Sensitivity Analyses: Not applicable due to design heterogeneity.
- •
- Reporting Bias Assessment: Formal statistical assessment was not possible due to inclusion of grey literature; this is noted as a limitation.
- •
- Certainty of Evidence: As evidence spans academic, corporate, and policy sources, certainty assessments were cautious and are elaborated in the discussion.
3. Results
3.1. RQ1—AI Effectiveness and Application of AI in OSINT/SOCMINT for Border Protection and Adjacent Functions
3.2. RQ2—Limitations
3.3. RQ3—Ethical, Legal, Societal Issues
4. Discussion
4.1. Cross-Thematic Discussion on RQ1 to RQ3
- •
- Data quality and veracity: Misinformation, data inadequacy, and adversarial manipulation (e.g., smuggling networks adapting their digital signals) were consistent concerns [57].
- •
- •
- Operational scalability: Deploying AI at border scale requires infrastructure and real-time processing capacities that are not always available.
- •
- Integration challenges: Interfacing AI with existing border management systems and legal protocols still pose an impediment.
- •
- •
- •
4.2. Gaps in the Current Literature
4.2.1. Inadequate Empirical Data on Long-Term Effectiveness and Societal Impact
4.2.2. Lack of Comprehensive Legal and Ethical Frameworks for Rapidly Evolving Technologies
4.2.3. Challenges in Data Integration and Interoperability Across Diverse Systems
4.2.4. Under-Explored Applications and Methodologies
4.3. Limitations of This Study
4.4. Suggestions for Future Research
4.4.1. Development of Robust, Explainable, and Ethical AI Models
4.4.2. Longitudinal Studies on the Socio-Legal Impacts of AI-Assisted Border Management
4.4.3. Fostering International Collaboration and Standardized Data Sharing Protocols
4.4.4. Exploring Novel Data Sources and Advanced Analytical Techniques
- •
- Multimodal Intelligence Fusion: Investigating how LMMs can effectively fuse and reason across diverse modalities (e.g., combining satellite imagery with social media text, or audio intercepts with dark web forum discussions) to create a fuller, more robustly framed intelligence picture for border security [104].
- •
- Proactive Threat Anticipation: Developing agentic systems that can autonomously monitor emerging online trends, identify nascent criminal typologies, and even simulate potential adversarial actions based on OSINT/SOCMINT data, providing early warnings to border agencies.
- •
- Automated Vetting and Insider Threat Detection: Researching the application of LLMs to analyze vast amounts of publicly available information and internal communications for vetting purposes, identifying subtle indicators of insider threat risk, while rigorously addressing privacy and bias concerns.
- •
- Adaptive Language Intelligence: Advancing LLMs to better understand and adapt to evolving slang, code-switching, and encrypted communications used by transnational criminal organizations, providing real-time translation and contextual analysis for border personnel.
- •
- Human–Agent Teaming Paradigms: Designing optimal human–agent teaming models where LLM-powered agents act as intelligent assistants, offloading repetitive tasks and providing rapid insights, while human analysts retain ultimate decision-making authority and provide critical ethical oversight. Integrating RAG via LangChain in human–agent teaming allows verification agents to dynamically retrieve evidence from external sources and reduce hallucinations in intelligence triage. Pilots could test this for border simulations, enhancing RQ2 reliability through modular, adaptive workflows [105].
- •
- Ethical Red Teaming for LLMs: Conducting rigorous red teaming exercises specifically for LLM and agent-based systems in border security to identify and mitigate potential biases, vulnerabilities, and unintended consequences before deployment.
4.4.5. Creative Synthesis: Envisaging Prospective Human-Agent Teaming
- •
- Source Monitoring Agent: Continuously ingesting and processing data from social media platforms (Twitter/X, Facebook, Telegram, etc.), online forums, news feeds, and media to identify migration-relevant signals in real-time across multiple languages and dialects. Research demonstrates that multimodal agents capable of handling text, images, and video can significantly enhance entity recognition and event interpretation for intelligence operations. Future multi-agent architectures could employ LangChain’s retriever chains with RAG to distribute OSINT queries across specialized agents, grounding outputs in federated vector stores to achieve accuracy gains in threat detection.
- •
- Pattern Analysis Agent: Employing advanced natural language processing and machine learning to detect subtle linguistic patterns, code-switching, and emergent terminology used by smuggling networks, enabling the identification of facilitation operations that evade traditional keyword-based detection [106]. Studies show that AI-powered narrative analysis can assist in mitigating the weaponization of social media through early detection of coordinated disinformation campaigns [107].
- •
- Network Mapping Agent: Constructing dynamic knowledge graphs that link entities—individuals, organizations, routes, transit hubs, and communication channels—to reveal the structure and evolution of irregular migration facilitation networks operating across borders. Graph foundation models have demonstrated efficacy in uncovering online information operations across multiple countries, suggesting applicability to transnational criminal network detection.
- •
- Strategic Foresight Agent: Synthesizing macro-indicators (fluctuating demographics, economic data, climate events, political instability, conflict zones, etc.) [14] with micro-level social media signals to generate early warning assessments of potential migration flow shifts [108], enabling proactive resource allocation and humanitarian response planning. This addresses the need for crowdsourced intelligence frameworks that can scale investigations while maintaining ethical oversight.
- •
- Human–AI Interface Agent: Providing operational analysts with natural language query capabilities to interrogate the integrated intelligence picture, asking complex questions such as ‘Correlate social media activity patterns in Region X with historical migration route data and identify anomalous movements in the past 72 h’ [44]. Recent advances in generative AI for critical infrastructure protection demonstrate that agentic AI systems can support proactive defense mechanisms while maintaining human oversight. LangChain’s LangGraph can orchestrate multi-agent RAG systems where exploration agents fetch SOCMINT signals and analysis agents fuse them with official databases, to achieve higher scores in entity recognition. This hybrid approach bridges RQ1 applications with real-time evidence grounding [105].
- •
- The GELSI Compliance and Oversight Agent: Addressing the critical governance gap identified in the literature, a dedicated “Ethical Oversight Agent” could be deployed. This agent’s sole function would not be intelligence gathering, but rather the real-time auditing of all other operational AI systems. Programmed with the specific constraints of data protection laws, human rights principles, and agency policies, this agent would act as an automated, internal overseer. It would monitor data access logs to prevent privacy violations, analyze the outputs of risk assessment models for statistical evidence of bias against protected groups, and create an immutable record of all AI-supported actions, to be reported to human overseers.
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| OSINT | Open-Source Intelligence |
| SOCMINT | Social Media Intelligence |
| SLR | Systematic Literature Review |
| PRISMA | Preferred Reporting Items for Systematic Reviews and Meta-Analyses |
| OSF | Open Science Framework |
| ML | Machine Learning |
| DL | Deep Learning |
| NLP | Natural Language Processing |
| CV | Computer Vision |
| LLM | Large Language Model(s) |
| LMM | Large Multimodal Model(s) |
| RQ | Research Question(s) |
| GELSI | Governance, Ethical, Legal, and Societal Implications |
| EU | European Union |
| GDPR | General Data Protection Regulation |
| EUAA | European Union Agency for Aylum |
| EDPS | European Data Protection Supervisor |
| CBP | Customs and Border Protection |
| DHS | Department of Homeland Security |
| TCO | Transnational Criminal Organization(s) |
| API | Application Programming Interface |
| ARIMA | Autoregressive Integrated Moving Average |
| SARIMA | Seasonal Autoregressive Integrated Moving Average |
| RMSE | Root Mean Square Error |
| MAE | Mean Absolute Error |
| MAPE | Mean Absolute Percentage Error |
| UAV | Unmanned Aerial Vehicle |
| LEA | Law Enforcement Agency |
| NER | Named Entity Recognition |
| KNN | K-Nearest Neighbor |
| RF | Random Forest |
| GBM | Gradient Boosting Machine |
| BART | Bayesian Additive Regression Trees |
| SVM | Support Vector Machine |
| CNN | Convolutional Neural Network |
| ANN | Artificial Neural Networks |
| XGBoost/GBDT | Gradient Boosted Decision Trees |
| AP | Average Precision |
| ACLED | Armed Conflict Location and Event Data |
| SIS | Schengen Information System |
| VIS | Visa Information System |
| Eurodac | European Dactyloscopy |
| EES | EU Entry/Exit System |
| ETIAS | European Travel Information and Authorisation System |
| ACS | American Community Survey |
| UNHCR | United Nations High Commissioner for Refugees |
| YOLO | You Only Look Once (object detection algorithm) |
| GNSS | Global Navigation Satellite System |
| ATT&CK | Adversarial Tactics, Techniques & Common Knowledge (MITRE framework) |
| CoT | Chain of Thought |
| RAG | Retrieval-Augmented Generation |
| DPIA | Data Protection Impact Assessment |
| ILP | Integer Linear Programming |
| NGO | Non-Governmental Organization |
References
- Stryker, C.; Kavlakoglu, E. What Is Artificial Intelligence (AI)? IBM. Available online: https://www.ibm.com/think/topics/artificial-intelligence (accessed on 9 August 2024).
- Wikipedia. Artificial Intelligence. Wikipedia; Wikimedia Foundation. Available online: https://en.wikipedia.org/wiki/Artificial_intelligence (accessed on 18 February 2019).
- Muduli, D.; Toppo, A.U.; Singh, V.; Singh, M.; Tiwari, D.P. YOLOv8-AF: A Novel Customized YOLOv8-Accurate Fast Deep Learning Model for Enhancing Border Security in Various Global Regions. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Khera, V.; Prasad, A.R.; Kwanoran, S. Open-Source Intelligence (OSINT): A Practical Introduction: A Field Manual; River Publishers: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
- Frontex. Artificial Intelligence-Based Capabilities for the European Border and Coast Guard: Final Report. 2020. Available online: https://www.frontex.europa.eu/assets/Publications/Research/Frontex_AI_Research_Study_2020_final_report.pdf (accessed on 12 August 2025).
- Unchained Project. Caravan of Light Report. 2023. Available online: https://unchainedproject.eu/wp-content/uploads/2023/02/CARAVAN-OF-LIGHT-REPORT-FINAL-14-FEBRUARY.pdf (accessed on 15 July 2025).
- Ozer, M.; Kucukkaya, G.; Kose, Y.; Mukasheva, A.; Ciris, K.; Penumatcha, B.V. Mapping Trafficking Networks: A Data-Driven Approach to Disrupt Human Trafficking Post Russia-Ukraine Conflict. arXiv 2025, arXiv:2504.17050. [Google Scholar] [CrossRef]
- World Customs Organization. Available online: https://www.wcoomd.org/en/media/newsroom/2024/august/unlocking-the-value-of-osint-in-customs-enforcement.aspx (accessed on 13 August 2025).
- Janjeva, A.; Harris, A.; Byrne, J. “The Future of Open-Source Intelligence for UK National Security,” RUSI Occasional Papers. Available online: https://cetas.turing.ac.uk/publications/future-open-source-intelligence-uk-national-security (accessed on 19 August 2025).
- Department of Homeland Security. Privacy Impact Assessment for the Use of Social Media by U.S. Customs and Border Protection (CBP). 2019. Available online: https://www.dhs.gov/sites/default/files/publications/privacy-pia-cbp58-socialmedia-march2019.pdf (accessed on 11 June 2025).
- Kodandaram, S.R.; Sunkara, M.; Ferdous, J.; Poursardar, F.; Ashok, V. Unveiling Coyote Ads: Detecting Human Smuggling Advertisements on Social Media. In Proceedings of the 35th ACM Conference on Hypertext and Social Media, Poznan Poland, 10–13 September 2024; pp. 259–272. [Google Scholar] [CrossRef]
- Dumbrava, C. Artificial Intelligence at EU Borders: Overview of Applications and Key Issues. European Parliamentary Research Service. 2021. Available online: https://www.europarl.europa.eu/RegData/etudes/IDAN/2021/690706/EPRS_IDA(2021)690706_EN.pdf (accessed on 19 August 2025).
- Zakaria, S.; Howard, I.; Coringrato, E.; Todsen, A.L.; Wade, I.; Ross, A.; Pisani, K.; Politi, C.; Szomszor, M.; Gunashekar, S. State-of-Play and Future Trends on the Development of Governance Frameworks for Emerging Technologies [Research Report]. RAND Corporation. 2024. Available online: https://www.rand.org/randeurope/research/projects/2024/governance-frameworks.html (accessed on 19 August 2025).
- Frontex. Strategic Risk Analysis 2024: Report. Available online: https://prd.frontex.europa.eu/wp-content/uploads/strategic-risk-analysis-2024-report.pdf (accessed on 19 August 2025).
- Gerdes, L.; Nemeth, G. A Comprehensive Approach to the Management of Migration Towards Europe. Z. Für Außen- Und Sicherheitspolitik 2025, 18, 85–114. [Google Scholar] [CrossRef]
- Frontex. Reference Architecture for European Border Surveillance (Version 3, No. 25.0178). Available online: https://www.frontex.europa.eu/assets/25.0178_European_Border_Surveillance_v3.pdf (accessed on 19 August 2025).
- European Border and Coast Guard Agency. Reference Architecture for European Border Surveillance; Publications Office: Luxembourg, 2025. Available online: https://data.europa.eu/doi/10.2819/8183238 (accessed on 27 May 2025).
- U.S. Customs and Border Protection. U.S. Customs and Border Protection Cybersecurity Strategy. Available online: https://www.cbp.gov/sites/default/files/assets/documents/2016-Jul/cbp-cyberstrategy-20160720.pdf (accessed on 20 July 2016).
- Invisible Gatekeepers: DHS’ Growing Use of AI in Immigration Decisions. American Immigration Council. 2025. Available online: https://www.americanimmigrationcouncil.org/blog/invisible-gatekeepers-dhs-growing-use-of-ai-in-immigration-decisions/ (accessed on 13 July 2025).
- AI Use Case Inventory Library|Homeland Security. 2025. Available online: https://www.dhs.gov/publication/ai-use-case-inventory-library (accessed on 8 August 2025).
- NATO Parliamentary Assembly. NATO and Artificial Intelligence: Report of the Science and Technology Committee (STC 058 E rev.2). 2024. Available online: https://www.nato-pa.int/document/2024-nato-and-ai-report-clement-058-stc (accessed on 17 August 2025).
- Recorded Future. Artificial Eyes: Generative AI in China’s Military Intelligence (Insikt Group). Available online: https://assets.recordedfuture.com/insikt-report-pdfs/2025/ta-cn-2025-0617.pdf (accessed on 17 June 2025).
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, 71. [Google Scholar] [CrossRef]
- Aziz, R.A.; Ahmed, T.; Zhuang, J. A machine learning–based generalized approach for predicting unauthorized immigration flow considering dynamic border security nexus. Risk Anal. 2024, 44, 1460–1481. [Google Scholar] [CrossRef]
- Wycoff, N.; Arab, A.; Donato, K.; Singh, L.; Kawintiranon, K.; Liu, Y.; Jacobs, E. Forecasting Ukrainian Refugee Flows with Organic Data Sources. Int. Migr. Rev. 2025, 59, 37–60. [Google Scholar] [CrossRef]
- Yildiz, D.; Wiśniowski, A.; Abel, G.J.; Weber, I.; Zagheni, E.; Gendronneau, C.; Hoorens, S. Integrating Traditional and Social Media Data to Predict Bilateral Migrant Stocks in the European Union. Int. Migr. Rev. 2025, 59, 90–118. [Google Scholar] [CrossRef]
- Bosco, C.; Minora, U.; Rosińska, A.; Teobaldelli, M.; Belmonte, M. A Machine Learning architecture to forecast Irregular Border Crossings and Asylum requests for policy support in Europe: A case study. Data Policy 2024, 6, e81. [Google Scholar] [CrossRef]
- El Rahwan, A. Artificial Intelligence and Interoperability for Solving Challenges of OSINT and Cross-Border Investigations. CEPOL Research and Science Conference 2022. Preparing Law Enforcement for the Digital Age. 2022. Available online: https://www.researchgate.net/publication/365710562_ARTIFICIAL_INTELLIGENCE_AND_INTEROPERABILITY_FOR_SOLVING_CHALLENGES_OF_OSINT_AND_CROSS-BORDER_INVESTIGATIONS (accessed on 11 June 2025).
- Carammia, M.; Iacus, S.M.; Wilkin, T. Forecasting asylum-related migration flows with machine learning and data at scale. Sci. Rep. 2022, 12, 1457. [Google Scholar] [CrossRef] [PubMed]
- Farion, O.; Balendr, A.; Androshchuk, O.; Mostovyi, A.; Grinchenko, V. Methods of Extraction and Analysis of Intelligence to Combat Threats of Organized Crime at the Border. J. Hum. Earth Future 2022, 3, 345–360. [Google Scholar] [CrossRef]
- Havas, C.; Wendlinger, L.; Stier, J.; Julka, S.; Krieger, V.; Ferner, C.; Petutschnig, A.; Granitzer, M.; Wegenkittl, S.; Resch, B. Spatio-Temporal Machine Learning Analysis of Social Media Data and Refugee Movement Statistics. ISPRS Int. J. Geo-Inf. 2021, 10, 498. [Google Scholar] [CrossRef]
- Hsiao, Y.; Fiorio, L.; Wakefield, J.; Zagheni, E. Modeling the Bias of Digital Data: An Approach to Combining Digital with Official Statistics to Estimate and Predict Migration Trends. Sociol. Methods Res. 2023, 53, 1905–1943. [Google Scholar] [CrossRef]
- Islam Md, S.; Rahim, M.A.; Podder, N.K.; Hossain Md, N.; Hossain, M.I. Prediction of Irregular Bangladesh-EU Migration Trends Using Machine Learning Techniques. In Proceedings of the 2024 3rd International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE), Gazipur, Bangladesh, 25–27 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Nair, R.; Madsen, B.S.; Lassen, H.; Baduk, S.; Nagarajan, S.; Mogensen, L.H.; Novack, R.; Curzon, R.; Paraszczak, J.; Urbak, S. A machine learning approach to scenario analysis and forecasting of mixed migration. IBM J. Res. Dev. 2020, 64, 7:1–7:7. [Google Scholar] [CrossRef]
- Hartawan, D.A.; Santoso, B.J.; Pratomo, B. Comparative Study of Machine Learning Algorithm on Linguistic Distinctions over Text Related to Human Trafficking and Sexual Exploitation. In Proceedings of the 2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation (ICAMIMIA), Surabaya, Indonesia, 14–15 November 2023; pp. 442–447. [Google Scholar] [CrossRef]
- Upadhayay, B.; Lodhia, Z.A.M.; Behzadan, V. Combating Human Trafficking via Automatic OSINT Collection, Validation and Fusion. In Proceedings of the ICWSM, Virtual, 7 June 2021. [Google Scholar] [CrossRef]
- Abate, D.; Paolanti, M.; Pierdicca, R.; Lampropoulos, A.; Toumbas, K.; Agapiou, A.; Vergis, S.; Malinverni, E.; Petrides, K.; Felicetti, A.; et al. SIGNIFICANCE. Stop illicit heritage trafficking with Artificial Intelligence. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 43, 729–736. [Google Scholar] [CrossRef]
- Pinto Hidalgo, J.J.; Silva Centeno, J.A. Geospatial Intelligence and Artificial Intelligence for Detecting Potential Coca Paste Production Infrastructure in the Border Region of Venezuela and Colombia. J. Appl. Secur. Res. 2023, 18, 1000–1050. [Google Scholar] [CrossRef]
- Madikeri, S.; Motlicek, P.; Tkaczuk, J.; Rangappa, P.; Sanchez Lara, A.; Rohdin, J.; Zhu, D.; Krishnan, A.; Klakow, D.; Ahmadi, Z.; et al. Autocrime—Open Multimodal Platform for Combating Organized Crime. Forensic Sci. Int. Digit. Investig. 2025, 54, 301937. [Google Scholar] [CrossRef]
- Obioha-Val, O.A.; Lawal, T.I.; Olaniyi, O.O.; Gbadebo, M.O.; Olisa, A.O. Investigating the Feasibility and Risks of Leveraging Artificial Intelligence and Open Source Intelligence to Manage Predictive Cyber Threat Models. J. Eng. Res. Rep. 2025, 27, 10–28. [Google Scholar] [CrossRef]
- Shafee, S.; Bessani, A.; Ferreira, P.M. Evaluation of LLM-based chatbots for OSINT-based Cyber Threat Awareness. Expert Syst. Appl. 2025, 261, 125509. [Google Scholar] [CrossRef]
- Yuan, X.; Wang, J.; Zhao, H.; Yan, T.; Qi, F. Empowering LLMs with Toolkits: An Open-Source Intelligence Acquisition Method. Future Internet 2024, 16, 461. [Google Scholar] [CrossRef]
- Freeman, N.K.; Nguyen, T.; Bott, G.; Parton, J.M.; Francel, C. Language Models for Adult Service Website Text Analysis. arXiv 2025, arXiv:2507.10743. [Google Scholar] [CrossRef]
- Karabatis, S.N.; Janeja, V.P. Creating Geospatial Trajectories from Human Trafficking Text Corpora. arXiv 2024, arXiv:2405.06130. [Google Scholar] [CrossRef]
- Biagio, M.S.; Simoncini, S.; La Mattina, E.; Morreale, V. MARPLE: A Framework for Social Media Threat Intelligence. In Proceedings of the 2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), Victoria, Seychelles,, 1–2 February 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Al-Shawakfa, E.M.; Alsobeh, A.M.R.; Omari, S.; Shatnawi, A. RADAR#: An Ensemble Approach for Radicalization Detection in Arabic Social Media Using Hybrid Deep Learning and Transformer Models. Information 2025, 16, 522. [Google Scholar] [CrossRef]
- Ayub, M.; Irum, S.; Jalil, Z. Enhanced Audio-Based Open-Source Intelligence Insights using Machine Learning. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2024, 10, 141–149. [Google Scholar] [CrossRef]
- Ndlovu, L.; Mkuzangwe, N.; De Kock, A.; Thwala, N.; Mokoena, J.; Matimatjatji, R. A Situational Awareness Tool using Open-Source Intelligence (OSINT) and Artificial Intelligence (AI). In Proceedings of the 2023 IEEE International Conference on Advances in Data-Driven Analytics and Intelligent Systems (ADACIS), Marrakesh, Morocco, 23–25 November 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Backfried, G.; Thomas-Aniola, D.; Pilutti, D.; Boyer, M.; Hein, R.; Tatabaei, A.; Zinkanell, M.; Suker, M.; Agathonos, P. PINPOINT—A multidisciplinary framework for semi-automatic risk assessment in military operations and civilian missions. In Proceedings of the Conference on Cognitive and Computational Aspects of Situation Management 2023, Philadelphia, PA, USA, 16–20 October 2023; pp. 172–177. [Google Scholar] [CrossRef]
- Harvey, A.; LeBrun, E. Computer Vision Detection of Explosive Ordnance: A High-Performance 9N235/9N210 Cluster Submunition Detector. J. Conv. Weapons Destr. 2023, 27, 9. [Google Scholar]
- Hassan, H.; Elayidom, S.; Irshad, M.R.; Chesneau, C. Design and implementation of EventsKG for situational monitoring and security intelligence in India: An open-source intelligence gathering approach. Intell. Syst. Appl. 2024, 24, 200458. [Google Scholar] [CrossRef]
- Li, W.; Wang, C.; Cui, X.; Liu, Z.; Guo, W.; Cui, L. COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence. arXiv 2025, arXiv:2503.03215. [Google Scholar] [CrossRef]
- Nguyen, T.H.; Rudra, K. Human vs ChatGPT: Effect of Data Annotation in Interpretable Crisis-Related Microblog Classification. In Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 4534–4543. [Google Scholar] [CrossRef]
- Pellet, H.; Shiaeles, S.; Stavrou, S. Localising social network users and profiling their movement. Comput. Secur. 2019, 81, 49–57. [Google Scholar] [CrossRef]
- Mansoor, N.; Schwarz, K.; Creutzburg, R. Importance of OSINT/SOCMINT for modern disaster management evaluation—Australia, Haiti, Japan. Electron. Imaging 2023, 35, 354-1–354-14. [Google Scholar] [CrossRef]
- Yang, Y.; Wang, S.; Li, D.; Sun, S.; Wu, Q. GeoLocator: A location-integrated large multimodal model (LMM) for inferring geo-privacy. Appl. Sci. 2024, 14, 7091. [Google Scholar] [CrossRef]
- Browne, T.O.; Abedin, M.; Chowdhury, M.J.M. A systematic review on research utilising artificial intelligence for open source intelligence (OSINT) applications. Int. J. Inf. Secur. 2024, 23, 2911–2938. [Google Scholar] [CrossRef]
- Bamigbade, O.; Sheppard, J.; Scanlon, M. Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature Review. arXiv 2024, arXiv:2402.15448. [Google Scholar] [CrossRef]
- Evangelista, J.R.G.; Sassi, R.J.; Romero, M.; Napolitano, D. Systematic Literature Review to Investigate the Application of Open Source Intelligence (OSINT) with Artificial Intelligence. J. Appl. Secur. Res. 2021, 16, 345–369. [Google Scholar] [CrossRef]
- Ijiga, A.C.; Olola, T.M.; Enyejo, L.A.; Akpa, F.A.; Olatunde, T.I.; Olajide, F.I. Advanced surveillance and detection systems using deep learning to combat human trafficking. Magna Sci. Adv. Res. Rev. 2024, 11, 267–286. [Google Scholar] [CrossRef]
- Ghioni, R.; Taddeo, M.; Floridi, L. Open source intelligence and AI: A systematic review of the GELSI literature. AI Soc. 2024, 39, 1827–1842. [Google Scholar] [CrossRef]
- Syllaidopoulos, I.; Ntalianis, K.S.; Salmon, I. A Comprehensive Survey on AI in Counter-Terrorism and Cybersecurity: Challenges and Ethical Dimensions. IEEE Access 2025, 13, 91740–91764. [Google Scholar] [CrossRef]
- Walsh, J.P. Social media and border security: Twitter use by migration policing agencies. Polic. Soc. 2020, 30, 1138–1156. [Google Scholar] [CrossRef]
- Milivojevic, S. Artificial intelligence, illegalised mobility and lucrative alchemy of border utopia. Criminol. Crim. Justice 2025, 25, 630–648. [Google Scholar] [CrossRef]
- Milaj, J.; Bonnici, J.P.M. Transparency as the defining feature for developing risk assessment AI technology for border control. Int. Rev. Law Comput. Technol. 2025, 39, 140–151. [Google Scholar] [CrossRef]
- Nalbandian, L. An eye for an ‘I:’ a critical assessment of artificial intelligence tools in migration and asylum management. Comp. Migr. Stud. 2022, 10, 32. [Google Scholar] [CrossRef] [PubMed]
- Chelioudakis, E. Unpacking AI-enabled border management technologies in Greece: To what extent their development and deployment are transparent and respect data protection rules? Comput. Law Secur. Rev. 2024, 53, 105967. [Google Scholar] [CrossRef]
- Vavoula, N. Tr-Ai-Nsforming Migration, Asylum and Border Management in the EU: The Roles of the Ai Act, Interoperable Large-Scale it Systems and EU Migration Agencies. SSRN 2024. [Google Scholar] [CrossRef]
- Buolamwini, J.; Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conference on Fairness, Accountability and Transparency, PMLR, New York, NY, USA, 23–24 February 2018; pp. 77–91. [Google Scholar]
- Gutierrez, M. Algorithmic Gender Bias and Audiovisual Data: A Research Agenda. Int. J. Commun. 2021, 15, 439–461. [Google Scholar]
- Keyes, O. The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition. Proc. ACM Hum.-Comput. Interact. 2018, 2, 1–22. [Google Scholar] [CrossRef]
- Weydner-Volkmann, S. Using Open, Public Data for Security Provision: Ethical Perspectives on Risk-Based Border Checks in the EU. Eur. J. Secur. Res. 2023, 8, 25–42. [Google Scholar] [CrossRef]
- Shaikh, R.; Joshi, G.; Himabindu, K. AI-Powered Monitoring System for Detecting Drug Trafficking on Social Media. Int. J. Innov. Sci. Res. Technol. 2025, 1062–1068. [Google Scholar] [CrossRef]
- Weissmann, M. Future threat landscapes: The impact on intelligence and security services. Security and Defence Quarterly 2025, 49, 40–57. [Google Scholar] [CrossRef]
- Yang, Y.; Zuiderveen Borgesius, F.; Beckers, P.; Brouwer, E. Automated decision-making and artificial intelligence at European borders and their risks for human rights. SSRN Electron. J. 2024. [Google Scholar] [CrossRef]
- Palotti, J.; Adler, N.; Morales-Guzman, A.; Villaveces, J.; Sekara, V.; Garcia Herranz, M.; Al-Asad, M.; Weber, I. Monitoring of the Venezuelan exodus through Facebook’s advertising platform. PLoS ONE 2020, 15, e0229175. [Google Scholar] [CrossRef]
- Brundage, M.; Avin, S.; Clark, J.; Toner, H.; Eckersley, P.; Garfinkel, B.; Dafoe, A.; Scharre, P.; Zeitzoff, T.; Filar, B.; et al. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. arXiv 2024, arXiv:1802.07228. [Google Scholar] [CrossRef]
- Meier, R. Threats and Opportunities in AI-generated Images for Armed Forces. arXiv 2025, arXiv:2503.24095. [Google Scholar] [CrossRef]
- Verma, S. Synthetic Identities and Deepfake Attacks: The Next Frontier in Enterprise AI Security. Eur. Mod. Stud. J. 2025, 9, 176–189. [Google Scholar] [CrossRef]
- Zhou, X.; Kim, H.; Brahman, F.; Jiang, L.; Zhu, H.; Lu, X.; Xu, F.; Lin, B.Y.; Choi, Y.; Mireshghallah, N.; et al. HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions. arXiv 2025, arXiv:2409.16427. [Google Scholar] [CrossRef]
- Babcock, J.; Kramar, J.; Yampolskiy, R.V. Guidelines for Artificial Intelligence Containment. arXiv 2017, arXiv:1707.08476. [Google Scholar] [CrossRef]
- Zhong, B.; Lavaei, A.; Cao, H.; Zamani, M.; Caccamo, M. Safe-visor Architecture for Sandboxing (AI-based) Unverified Controllers in Stochastic Cyber-Physical Systems. Nonlinear Anal. Hybrid Syst. 2021, 43, 101110. [Google Scholar] [CrossRef]
- Staves, A.; Gouglidis, A.; Hutchison, D. An Analysis of Adversary-Centric Security Testing within Information and Operational Technology Environments. Digit. Threat. Res. Pract. 2023, 4, 1–29. [Google Scholar] [CrossRef]
- Hossain, E.; Al Mahmud Ashik, A.; Rahman, M.M.; Khan, S.I.; Rahman, M.S.; Islam, S. Big Data and Migration Forecasting: Predictive Insights into Displacement Patterns Triggered by Climate Change and Armed Conflict. J. Comput. Sci. Technol. Stud. 2023, 5, 265–274. [Google Scholar] [CrossRef]
- Suciu, G.; Sachian, M.-A.; Bratulescu, R.; Koci, K.; Parangoni, G. Entity Recognition on Border Security. In Proceedings of the 19th International Conference on Availability, Reliability and Security, Vienna Austria, 30 July–2 August 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Expanded ASEAN Guide on AI Governance and Ethics—Generative AI. ASEAN Main Portal. 2024. Available online: https://asean.org/book/expanded-asean-guide-on-ai-governance-and-ethics-generative-ai/ (accessed on 13 September 2025).
- European Union Agency for Law Enforcement Cooperation. AI and Policing: The Benefits and Challenges of Artificial Intelligence for Law Enforcement. Publications Office. 2024. Available online: https://data.europa.eu/doi/10.2813/0321023 (accessed on 27 May 2025).
- U.S. Executive Office of the President. Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (Federal Register, 88 (212), 66073-66084). Available online: https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence (accessed on 1 November 2023).
- Georgakopoulou, A.; Kokkinis, G.; Spathi, T. An Overview of Systems Related to Border Management and Migration into the EU: A Concept of Prevention and Detection of Illegal Activities. In Information and Communications Technology in Support of Migration; Akhgar, B., Hough, K.L., Samad, Y.A., Bayerl, P.S., Karakostas, A., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2022. [Google Scholar] [CrossRef]
- Vavoula, N. Artificial Intelligence (AI) at Schengen Borders: Automated Processing, Algorithmic Profiling and Facial Recognition in the Era of Techno-Solutionism. Eur. J. Migr. Law 2021, 23, 457–484. [Google Scholar] [CrossRef]
- Schwarz, K.; Bollens, K.; Arias Aranda, D.; Hartmann, M. AI-Enhanced Disaster Management: A Modular OSINT System for Rapid Automated Reporting. Appl. Sci. 2024, 14, 11165. [Google Scholar] [CrossRef]
- Ajana, B. Augmented borders: Big Data and the ethics of immigration control. J. Inf. Commun. Ethics Soc. 2015, 13, 58–78. [Google Scholar] [CrossRef]
- C, A.; Carter, R. Large Language Models and Intelligence Analysis. Available online: https://cetas.turing.ac.uk/publications/large-language-models-and-intelligence-analysis (accessed on 12 July 2025).
- Sullivan, G.; Van Den Meerssche, D. An Infrastructural Brussels Effect: The translation of EU Law into the UK’s digital borders. Comput. Law Secur. Rev. 2024, 55, 106057. [Google Scholar] [CrossRef]
- OSCE Office of the Special Representative and Co-Ordinator; for Combating Trafficking in Human Beings. New Frontiers: The Use of Generative Artificial Intelligence to Facilitate Trafficking in Persons [Policy Brief]. Organization for Security and Co-operation in Europe (OSCE). 2024. Available online: https://www.osce.org/files/f/documents/7/d/579715.pdf (accessed on 27 May 2025).
- George Mason University. Using AI to Uncover Human Smuggling Networks. Available online: https://cec.gmu.edu/news/2025-01/using-ai-uncover-human-smuggling-networks (accessed on 12 March 2025).
- Ranade, P.; Joshi, A. FABULA: Intelligence Report Generation Using Retrieval-Augmented Narrative Construction. In Proceedings of the International Conference on Advances in Social Networks Analysis and Mining, Kusadasi, Türkiye, 6–9 November 2023; pp. 603–610. [Google Scholar] [CrossRef]
- Gandhi, S.T. RAG-Driven Cybersecurity Intelligence: Leveraging Semantic Search for Improved Threat Detection. Int. J. Res. Appl. Innov. 2023, 6, 8889–8897. [Google Scholar] [CrossRef]
- LangChain. Build a RAG agent with LangChain. LangChain Docs. Available online: https://docs.langchain.com/oss/python/langchain/rag (accessed on 1 November 2025).
- Suresh, H.; Guttag, J. A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle. In Proceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, Virtually, 5–9 October 2021; pp. 1–9. [Google Scholar] [CrossRef]
- Veale, M.; Binns, R.; Edwards, L. Algorithms that remember: Model inversion attacks and data protection law. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2018, 376, 20180083. [Google Scholar] [CrossRef] [PubMed]
- Trattner, C.; Oberhauser, R.; Marty, P. OSINT research studios: A flexible crowdsourcing framework to scale up open source intelligence investigations. Proc. ACM Hum.-Comput. Interact. 2024, 8, 1–38. [Google Scholar] [CrossRef]
- Zhao, D. FRAG: Toward Federated Vector Database Management for Collaborative and Secure Retrieval-Augmented Generation (Version 1). arXiv 2024. [Google Scholar] [CrossRef]
- North Carolina State University Laboratory for Analytic Sciences. Large Language Models for Intelligence Analysis. Available online: https://ncsu-las.org/2024/11/large-language-models-for-intelligence-analysis/ (accessed on 3 August 2025).
- Yang, T.-L.; Liu, J.-S.; Tseng, Y.-H.; Jang, J.-S.R. Knowledge retrieval based on generative AI. arXiv 2025. [Google Scholar] [CrossRef]
- Zhou, S.; Peng, J.; Ferrara, E. Tracing the Unseen: Uncovering Human Trafficking Patterns in Job Listings. arXiv 2024, arXiv:2406.12469. [Google Scholar] [CrossRef]
- Luceri, L.; Boniardi, E.; Ferrara, E. Leveraging Large Language Models to Detect Influence Campaigns on Social Media. In Proceedings of the Companion Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024. [CrossRef]
- Goglia, D. Multi-Aspect Integrated Migration Indicators (MIMI) Dataset; (Version 2) [Dataset]; Zenodo. 2022. Available online: https://zenodo.org/records/6493325 (accessed on 27 May 2025).





| Category | Study | Task/AI Used | Data Source | Measurable Metrics and Outcomes | Implementation Context |
|---|---|---|---|---|---|
| Border Security and Migration Monitoring | [24] | Predicting unauthorized immigration flow/ML models (BART, RF, GBM, SVM, KNN). | Public macro-level datasets (CBP, World Bank, etc.). | BART model improved Test RMSE by 86.38% and MAE by 89.92% over SARIMA baseline. | U.S.–Mexico border. Macro-level forecasting using 2001–2018 data for border management analytics. |
| [25] | Migration monitoring, conflict-driven humanitarian displacement/ Bayesian inference. | Ukrainian-language Twitter; conflict events and fatality data (web); UNHCR official border crossing counts. | Mean Absolute Percent Error (MAPE): 0.22 to 0.81. Very low prediction errors; high level of forecasting accuracy when compared to actual UNHCR daily border crossing statistics. | Ukraine (land border crossings to Poland, Romania, Slovakia, Moldova, and Hungary). | |
| [26] | Predict intra-EU regular migration/Bayesian hierarchical modeling (statistical ML) | Facebook Marketing API; Eurostat online data. | 2011–2019 totals increasing from 12.5M to 15.7M; corridor MAPE ≈ 8%. | EU28 (including the UK). | |
| [19,20] | U.S. border and immigration operations (CBP, ICE, USCIS).—immigration border management cases/ GenAI; NLP; CV | Twitter (X); Facebook; Instagram; News sources; Dark Web | Qualitative; cites 105 active DHS AI use cases (140+ according to current study authors’ own analysis). No performance metrics provided. | U.S. border and immigration operations. | |
| [27] | Forecasting Irregular Border Crossings (IBCs) and Asylum requests/ML architecture (ANN, XGBoost/GBDT, Random Forests) | Publicly available official sources (Frontex monthly IBC detections by route; Eurostat first-time asylum). | Explained variance as high as 80% over a 6-month forecast. | Europe (Central Mediterranean/Italy). Designed to provide policy support. | |
| [28] | Resolving multiple, fraudulent identities in cross-border investigations/NLP/NER. | Open sources/social media integrated with EU central systems (SIS, VIS, /Eurodac/EES/ETIAS). | Conceptual; no quantitative benchmarks reported. | EU law enforcement and border contexts, focusing on interoperability. Conceptual framework, not a deployed system. | |
| [29] | Forecast asylum applications/ML | Open sources-Global Database of Events, Language, and Tone-(GDELT); Google Trends | Empirical quantitative-System’s four-week forecasts remain within ±2 standard-error bands of a Moving Average process | EU | |
| [30] | Extracting intelligence on cross-border crime from internet resources/Neural networks; decision trees; AI-powered web search. | Open web; Deep/dark web; social media (broad reference, no enumeration). | 1.23× reduction in time for obtaining intelligence; up to 20% efficiency increase. | In support of State Border Guard Service of Ukraine. | |
| [31] | Forecasting near-term refugee arrivals and extracting routes/ML;NLP;CN. | Geo-tweets, random Twitter stream, UNHCR refugee statistics. | CNN AP of 0.871 (outperforming SVM); ARIMA model had best forecast RMSE. | Balkan refugee route (2015–2016). Developed for public authorities and relief organizations. | |
| [32] | Estimating and predicting state-level migration trends/Bayesian space–time models. | Geo-located Twitter data and American Community Survey (ACS) data. | Combined model produces more accurate predictions than using official ACS data alone. | U.S. (state to state); enables timely estimates of population movement through OSINT when combined with administrative data. | |
| [33] | Forecasting irregular Bangladesh-EU migration trends/ML (ARIMA time-series modeling, XGBoost, Decision Tree Regressor, CatBoost Regressor, Feed Forward Neural Network) | Frontex data on irregular border detections and qualitative fieldwork. | ARIMA model performed best with the lowest Mean Squared Error. | Bangladesh to EU migration routes. Focuses on policy support based on historical data. | |
| [34] | Annual forecasting of mixed-migration flows /ML. | Online institutional datasets (e.g., World Bank, UNHCR, UNDESA). | Annual forecasts with MAE ≈ 6000 for most destinations. | Ethiopia as origin with forecasts to Saudi Arabia, South Africa, Denmark, Great Britain, Italy, and Sweden. | |
| Counter-Trafficking and Smuggling | [35] | Detect human trafficking and sexual exploitation (HTSE)/ML classifiers (Random forest; Naïve Bayes; Gradient Boosting; Decision trees). | Online text ads. | Random Forest: F1 ≤ 0.962; Naïve Bayes: F1 ≈ 0.961; Gradient Boosting: F1 ≈ 0.914–0.952; Decision Trees: F1 ≈ 0.927–0.930. | Not Specified; mixed online sources including Canadian ads. |
| [3] | Identifying ground pits from aerial drone imagery for border security/DL-CV (YOLO v8); CNN. | Public (online) Ground Pit Image Dataset (GPID). | Accuracy ≥ 95%; mAP = 0.895; processes 116.28 frames per second. | Global (authors mention for potential applicability borders of India–Pakistan, Israel–Gaza, US–Mexico, Ukraine–EU). | |
| [11] | Detecting human smuggling ads on social media/LLMs; NLP; CV. | Custom dataset of images and videos; collected from TikTok and Facebook. | F1: 0.92 for image ads and 0.86 for video ads. | U.S./Mexico border context. | |
| [36] | Extract actionable intelligence on human trafficking/ML; NLP. | Online news (Google News RSS) | Relevance filter P ≈ 0.63. Victim name extraction Acc: 72.53%. Suspect name Acc: 58.69%. | Automated global OSINT pipeline developed with NGO Love Justice International. | |
| [37] | Stop illicit, online heritage trafficking/CNNs; ontology-based DL; multimodal image–text AI. | eBay-online auction houses; Facebook; Catawiki, TOR forums | Concrete architecture; Targeted gains; lacks empirical metrics beyond stated target (10–15% identification increase) | EU; Cyprus | |
| [38] | Detect coca paste production infrastructures near border areas/CV;DL | Online satellite imagery (PlanetScope); UNODC/SIMCI (public) | mAP 90.07% | Venezuela–Colombia border | |
| [39] | Criminal network analytics for cross-border drug-trafficking/NLP, NER | Lawful interceptions enriched with data from social media. | NER (human transcripts): F1 = 0.828 (English), 0.701 (German); NER (ASR transcripts): F1 = 0.397 (English), 0.279 (German); Social influence (speaker network): accuracy = 0.95; Social influence (telephone network): accuracy = 0.795; Link prediction: Top-5 accuracy = 0.588. | EU | |
| Cybersecurity& Threat Intelligence | [40] | Predictive cyber threat modeling/ML (logistic regression). | Twitter Academic API, Common Crawl, MITRE ATT&CK. | Logistic regression: accuracy = 94.98%; precision = 88.69%. | General cybersecurity application. Not border-specific. Conceptually transferable to proactive detection of cyber threats affecting border agencies’ digital infrastructure and situational awareness platforms. |
| [41] | OSINT-based CTI/LLMs. | Twitter/X data. | Classification F1 ≈ 0.94 (comparable to baseline); NER F1 ≈ 0.43 (inferior to baseline). | General OSINT for CTI; not border-specific. Conceptually transferrable-LLM chatbots can triage OSINT cyber threats for border protection agencies. Weak NER necessitates specialized extraction models | |
| [42] | OSINT-based CTI/LLMs (agents-CoT reasoning-RAG). | Online (websites offering open-source vulnerability information) | Up-to-86% accuracy increase on the best dataset. | China originating work. General cybersecurity application. | |
| Linking and Analyzing Trafficking Networks | [43] | Aid counter sex—trafficking-adult service website text analysis/ NLP;DL (transformer encoder, BERT-based) | 240 M adult service website ads collected online-yielded 19,833,258 unique post texts for model pretraining | Accuracy ≈ 0.967–0.968–0.9689 Recall ≈ 0.871–0.893–87.91% F1 Score ≈ 91.3–91.9–91.87% ROC AUC ≈ 93.1–94.0–93.52% | U.S. |
| [44] | Extracting geospatial trafficking routes from text narratives/NLP. | Human trafficking text corpora from news and NGO websites. | Outperformed existing methods; +26.8% F1 improvement over an ILP geotagger. | General application. Motivating scenario is Eastern Mediterranean (Syria–Turkey–Greece). No operational deployment. | |
| [7] | Mapping human trafficking networks and identifying key players-financial flows/ML;NLP. | OSINT, social networks, dark web, public blockchains (Bitcoin). | Prototype stage; no quantitative metrics reported. | Russia–Ukraine conflict (post-2022). Designed for international LEAs and NGOs. No field validation. | |
| Terrorism, Extremism and Online Radicalization | [45] | Identifying and classifying terrorist threats on the open web/DL;ML;NLP. | Social media (X/Twitter) as the primary source; surface web and dark web monitoring | High accuracy in balanced experiments (SVM accuracy 0.951); operationally relevant performance in pilots (accuracy 0.95–0.99; F1 ≈ 0.90) | EU-funded project with LEA participation and feedback. |
| [46] | Detection of radicalization in social media/DL; Ensemble ML. | 89,816 Arabic language tweet dataset. | Accuracy = 0.98; F1 = 0.97 | Various geographic distribution of Arabic language content (Middle East: 68%, North Africa: 17%, Europe: 8%, North America: 4%, Other/Unknown: 3%). | |
| [47] | Enhanced audio-based OSINT insights for NLP/NER;ML. | Audio inputs drawn from openly available online platforms, radio, and podcasts. | DistilBERT achieved top scores (e.g., LOCATION F1 = 0.95). | Pakistan’s defense context. Not border-security specific. Adaptable to border-security OSINT where voice intercepts, public radio, or posted audio require fast entity extraction for situational awareness | |
| [48] | Forecast civil unrest (e.g., strikes, protests) before events occur (Situational Awareness)/ML;NLP. | Twitter (scraped via Twint); South African online news outlets; independent conflict monitoring website (ACLED). | Logistic regression with TF-IDF achieved 99% accuracy with <1% FP/FN. | South Africa. Designed for public safety and crisis management. | |
| Other Security and Intelligence Tasks | [49] | Semi-automatic risk assessment framework fusing OSINT and GNSS data/NLP;CV. | News, social media, and other public data. | Project in progress; no quantitative performance figures reported; datasets being collected for future training and evaluation. | EU missions in regions with border-related challenges; not operationally validated |
| [50] | Cluster submunitions detection in border-conflict zones/CV (Yolo v5). | OSINT imagery, social media, augmented with synthetic data. | F1 = 0.98; mAP 0.5 ≈ 0.992. | Validated on Ukraine conflict imagery for humanitarian mine action and OSINT triage. | |
| [51] | Security intelligence, situational monitoring/NLP;KGs. | Online news articles (NewsAPI). | High tagging accuracy in human evaluation (Location: 96.03%). | India-focused on national security OSINT and situation monitoring. Potential adaptability to border contexts (e.g., localized incident tracking). | |
| [52] | Turn unstructured multimodal open-source data into actionable intelligence/MLLM–KG (agentic architecture). | Chinese open sources (official media, commercial news outlets, Weibo). | Entity recognition F1 = 0.79. Context matching Hit@1 = 0.83. | China-centric sources and contexts. No specific deployment. | |
| [53] | Classifying crisis-natural disaster related microblogs/LLMs (ChatGPT). | Crisis-related tweets. | Model trained on ChatGPT data performed ~3–6% worse than on human-annotated data. | Nepal earthquake 2015; Typhoon Hagupit. | |
| [54] | Localizing social network users and profiling movement/NLP; ML. | Twitter, Facebook, Instagram data. | Logistic regression achieved 77.72% location prediction accuracy. | Not border-specific. Potentially applicable to border risk analysis (e.g., route reconstruction, hotspot presence inference) via OSINT geolocation of public posts | |
| [55] | Automated reporting for disaster management/Applied data mining and ML–enabled social media analytics. | Social media (Twitter, Facebook, Instagram); open mapping (OpenStreetMap, Google Maps/Earth); government open data | Achieved score of up to 89% (AlignScore metric). | U.S. (Hurricane Harvey) and Turkey. Methods for situational awareness, early warning, misinformation management, and geo-targeted risk assessment from open sources could be transferable to border contexts, in principle | |
| [56] | Infer geolocation from images and social media posts/LLM-LMM (ChatGPT4-based LMM). | Google Maps imagery, Google Images, social media posts. | Outperforms Google/GPT-4; Landmark Acc: 94%, Street-view Acc: 54%. | General OSINT/geo-privacy demonstration. Global coverage examples. | |
| Systematic and Conceptual Reviews | [57] | Systematic review of AI combined with OSINT. | 163 academic publications. | Bibliometric; notes common use of accuracy, precision, F1, but many studies omit full reporting. | Global literature (2011–2021). Spans security domains but does not isolate border deployments. |
| [58] | Systematic review of CV for multimedia geolocation in trafficking investigations. | 123 peer-reviewed studies. | Synthesizes techniques; does not report a unified accuracy metric. | General law enforcement/digital forensics workflows. No specific border deployments evaluated. | |
| [59] | Systematic literature review on OSINT with AI. | 244 academic publications (1990–2019). | Bibliometric; 24% of papers explicitly mention AI. Main applications: cybersecurity (41%) and social media (19%). | Global scholarly literature. Does not isolate border-security deployments. | |
| [60] | Systematic review of DL in surveillance to combat human trafficking. | Social media, online ads, video surveillance, etc. | Case-based successes and proxy accuracy of ~89.5% for social media threat ID cited. No unified metrics. | Multi-context, including border checkpoints/airports. Examples from U.S., EU, Nigerian LEA collaborations. | |
| [61] | Systematic review of AI-powered OSINT literature concerning ethics, legal, and social implications (GELSI). | 571 Academic publications. | Bibliometric; GELSI literature is ~12% of the corpus but has higher search rank. No performance metrics. | Global scholarship (1992–2022). Does not evaluate specific border-security applications. | |
| [62] | Comprehensive survey on AI in counter-terrorism and cybersecurity. | Literature-derived. OSINT/SOCMINT central. | Cites results from literature (e.g., up to 96.6% accuracy for terrorist location prediction). | Spans national security, cyber ops, and EU border programs (automated border control, biometrics). |
| Limitation Category | Specific Limitation | Description of Impact on Reliability and Practical Utility | Study |
|---|---|---|---|
| Data Quality | Bias and Non-Representativeness | Social/open-source signals over-represent digitally connected groups, specific languages and routes; institutional and media accounts drive agenda bias. Model inferences overfit to these skews, misreading risk levels and diverting resources to the wrong corridors/times. Intersectional biases in facial/AV systems perpetuate stereotypes, e.g., higher errors for women/trans in speech/image fusion. AGR fails non-binary (20–30% errors) by enforcing binary models. | [25,26,31,63,64,65,66,67,68,69,70,71] |
| Data Quality | Veracity, Misinformation, and Noise | Rumors, coordinated campaigns, bots, and adversarial posts contaminate streams; weak source verification raises false positives/negatives and erodes operator trust in AI outputs for triage and deployment. | [26,31,61,63,66,67,72,73,74,75] |
| Data Quality | Data Scarcity and Obsolescence | Sparse/lagged ground truth; rapid concept drift during crises; API/policy changes throttle access; weak label pipelines for multilingual/multimodal data. Forecasts and alerts become brittle and non-transferable across borders/time. | [25,26,31,34,49,66,67,68,73,76] |
| Data Quality | Adversarial Manipulation and Synthetic Content | Generative AI enables the creation of photorealistic deepfakes and synthetic identities (human misclassification rates 38.7 50%; information laundering spreads false narratives across platforms obscuring original sources; AI-generated smuggling ads evade detection systems trained on authentic content | [77,78,79] |
| Technical | Scalability | End-to-end OSINT pipelines (ingest ⟶ filter ⟶ label ⟶ model ⟶ serve) falter when handling high-velocity, multilingual, multimodal streams. Compute/IO and annotation throughput introduce latency, narrowing geographic/linguistic coverage in live operations. | [25,26,31,34,49,61,67,73,74,75] |
| Technical | Model Opacity (“Black Box” Problem) | Complex/closed models limit interpretability, contestability, and Data Protection Impact Assessment (DPIA)/rights review for high-stakes border uses (screening, triage, profiling). Lack of explainability undermines acceptance and redress. | [61,65,66,67,68,72,75] |
| Technical | Difficulty of Troubleshooting | Multi-component, third-party dependent OSINT stacks are hard to reproduce/debug (API versions, filters, model updates). Failures are difficult to localize, slowing incident response and model correction. | [31,34,49,61,68,75] |
| Operational | Ethical and Human Rights Risks | Discriminatory profiling, function creep, and chilling effects in border contexts; error propagation into downstream decisions with weak mechanisms for challenge/redress. | [64,66,67,68,72,75] |
| Operational | Policy-Engineering Mismatch | Ambitious AI use outpaced legal/process readiness. Transparency, accuracy documentation, DPIAs, procurement and auditing are not embedded in engineering/contracting, limiting lawful/safe deployment. | [66,67,68,72] |
| Operational | Lack of Legal and Governance Frameworks | Fragmented/immature governance for AI + OSINT at borders (lawful basis, purpose limitation, accuracy duties, accountability chains). Unclear oversight reduces reliability claims and real-world utility. | [65,66,68,72,75] |
| Operational | Sandbox-to-Production Gap | Testing environments introduce performance overhead and evaluation biases; multi-turn interactions reveal risks missed in single-turn tests; heavy containment measures impose prohibitive computational costs that limit operational competitiveness | [80,81,82,83] |
| Ethical/Legal Category | Specific Implication/Risk | Description of Impact on Rights and Society | In-Text Citations |
|---|---|---|---|
| Privacy, Data Protection, and Surveillance | Surveillance Overreach and “Chilling Effects” | The mass collection and automated analysis of public data can lead to disproportionate surveillance, creating a “chilling effect” (censorship pressure) on free expression and association for migrants and diaspora communities. | [56,61,62,64,65,76,84,85] |
| Privacy, Data Protection, and Surveillance | Violation of Data Protection Principles | AI systems often conflict with core data protection principles like purpose limitation and data minimization, especially when data is repurposed or fused across systems without a clear legal basis, consent, or adequate safeguards. | [9,66,86,87,88,89,90] |
| Privacy, Data Protection, and Surveillance | Risks from Sensitive Data and Re-identification | Processing publicly available data on vulnerable populations (e.g., refugees) carries inherent risks of re-identification and harm. The use of biometrics and inferred sensitive attributes require strict necessity, proportionality, and security controls. | [6,11,39,40,91] |
| Bias, Discrimination, and Fairness | Algorithmic Bias and Discrimination | Models trained on historical or unrepresentative data can perpetuate and amplify existing biases, leading to discriminatory outcomes where certain nationalities, ethnicities, or groups are disproportionately flagged or scrutinized. Gender classifiers show 20–40× errors for darker-skinned women/trans; binary AGR misclassifies non-cisgenders. | [21,62,66,69,71,72,84,87,90,92,93] |
| Bias, Discrimination, and Fairness | Digital Exclusion and Unequal Representation | Reliance on digital data sources can systematically exclude or underrepresent vulnerable groups with limited digital access (e.g., the elderly, rural populations), leading to skewed analyses and inequitable policy responses. Misprofiling risk for women and LGBTQIA+ migrants at borders, in surveillance/trafficking detection, leading to wrongful detention, safety threats, or overlooked asylum claims. | [69,70,76] |
| Accountability, Transparency, and Governance | Lack of Transparency (“Black Box” Problem) | The opacity of complex AI models makes it difficult to scrutinize their logic, challenge erroneous decisions, or provide meaningful explanations to affected individuals, undermining due process and the right to an effective remedy. | [9,19,61,64,94] |
| Accountability, Transparency, and Governance | Deficits in Legal and Governance Frameworks | The rapid pace of AI deployment often outstrips the development of clear legal rules and oversight mechanisms, creating gaps in accountability, liability, and governance for systems used in high-stakes border decisions. | [62,65,72,86,88,95] |
| Accountability, Transparency, and Governance | Dehumanization and Pre-emptive Logic | Automated risk profiling can shift governance from evidence-based decisions to pre-emptive interventions based on statistical correlations, treating individuals as data profiles and potentially undermining the presumption of innocence. | [64,66,72,92] |
| Challenge Domain | Technical Dimension (RQ2) | Ethical/Legal Dimension (RQ3) | Operational Implication (Impact on RQ1) |
|---|---|---|---|
| Data Quality | Bias, non-representativeness, noise | Discriminatory profiling, digital exclusion | Misallocation of resources |
| Model Opacity | Difficult troubleshooting, limited explainability | Accountability deficits, contestability issues | Reduced operational trust |
| Scalability | Computational bottlenecks, latency | Surveillance overreach concerns | Geographic/linguistic coverage gaps |
| Governance Gaps | Policy-engineering mismatch | Fragmented legal frameworks | Deployment hesitancy, litigation risk |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karakikes, A.; Kotis, K. AI-Assisted OSINT/SOCMINT for Safeguarding Borders: A Systematic Review. Information 2025, 16, 1095. https://doi.org/10.3390/info16121095
Karakikes A, Kotis K. AI-Assisted OSINT/SOCMINT for Safeguarding Borders: A Systematic Review. Information. 2025; 16(12):1095. https://doi.org/10.3390/info16121095
Chicago/Turabian StyleKarakikes, Alexandros, and Konstantinos Kotis. 2025. "AI-Assisted OSINT/SOCMINT for Safeguarding Borders: A Systematic Review" Information 16, no. 12: 1095. https://doi.org/10.3390/info16121095
APA StyleKarakikes, A., & Kotis, K. (2025). AI-Assisted OSINT/SOCMINT for Safeguarding Borders: A Systematic Review. Information, 16(12), 1095. https://doi.org/10.3390/info16121095

