Open-Source Intelligence Educational Resources: A Visual Perspective Analysis

: Given the growing application of open-source intelligence (OSINT), which has facilitated fast decision-making, this study aims to explore how research and educational material production in OSINT has evolved. For this analysis, two OSINT material sources are examined: the research dissemination databases and educational resources repositories. Considering that web information may or may not be publicly available, web Scraping and querying web interface strategies are used to metadata extraction. Finally, we suggest a ﬁndings hierarchical classiﬁcation for the metadata retrieval results. Our main results: (1) Google Scholar and NewsBank are the centralizing axes of OSINT publications; (2) OSINT presents a broad development in the areas of defense and security; thus, presenting itself a promising future; (3) it is necessary both to generate educational resources that complement OSINT training processes and documenting existing resources with a metadata structure deﬁned for this purpose; (4) pay increased attention to the last stages of the OSINT process, to use this knowledge in more assertive ways. This study allows guiding the researchers to the current state of research and education in OSINT and promotes a useful metadata description to make resources accessible and reusable in the educational environment.


Introduction
Currently, there is a diversification of services offered on the web, which has led to an evolution of a growing mass of digital data [1]. These data can be accessed by Application Programming Interfaces (APIs), or different services, applications, etc. As an example, people who make use of services available on the web to register sites of interest, travel, political or religious affiliations, photos, among other sources of disclosure of a clearly public nature, can be found. However, not everyone is aware that a large proportion of this information is publicly exposed and can be used by individuals or organizations with different purposes [2]. This means that all information published on social networks, discussion forums, and group chats, among other sources, is free and accessible to anyone, considering the restrictions that may apply [3]. Nevertheless, even when large amounts of data are found, these are, themselves, considered as unevaluated material obtained from any source. Yet, when such data are elaborated and treated, acquiring meaning and utility, they are transformed into information. Furthermore, if experience, understanding, and codification are added to this, such information becomes knowledge. Once this is made available to a person interested in the purpose of helping the • a set of indicators is proposed to support and justify the research and educational materials production in OSINT; • the useful and complete metadata description and documentation are promoted to make resources more accessible and reusable in the educational environment; • the design and use of educational material that supports OSINT training processes are promoted.
Briefly, our contribution allows stimulating research and advances within the OSINT ecosystem, both in application domains, as well as in the generation of resources, which can lead to supporting the growth that this strategy has been experiencing.
To conduct this study, a brief exploration of state-of-the-art OSINT is presented. Subsequently, a methodology to address the quantitative collection of information is proposed. This will, consequently, lead to carry out a tabulation and analysis of the results obtained in order to evaluate the interest that the OSINT topic arouses in the academic and research world. Once the survey was completed, some conclusions and future work are drawn. The collection of data required for this research-originated from open sources-was carried out between 1 February and 25 April 2020.

State-of-the-Art OSINT
Considering that the main aim of our research is to evaluate how the research dissemination and production of educational material in OSINT have evolved, we see OSINT research dissemination as a vital input to evaluate the behavior of resource openness and availability. Additionally, reviewing the production of OSINT educational materials allows us to verify if educational resources have been produced in this topic and how educational resources are being described (metadata) in these open sources.
For the reasons outlined above, three essential aspects of OSINT are summarized in this section. Firstly, the concept of open-source is reviewed, as the base of the open-source intelligence process, to subsequently address both some tools used in the OSINT process and the availability of information on the web. Finally, some approaches using open-source intelligence will be briefly reviewed.

OSINT Sources
According to [16], OSINT sources are public sources-independent of whether their content is commercialized or free. They can be documents with any content, in any medium, under any means of transmission or mode of access. OSINT sources are distinguished from other forms of intelligence because they must be legally accessible to the public without violating any copyright or privacy laws [17]. These open-sources include [18][19][20]: • government data and public reports, budgets, hearings, telephone directories, press conferences, websites, and speeches; • professional and academic publications, information acquired from journals, conferences, symposia, academic papers, dissertations, and theses; • commercial data and images, financial and industrial evaluations, as well as databases; • grey literature, technical reports, preprints, patents, working documents, commercial documents, unpublished works, and bulletins; • photos and videos, including metadata; • geospatial information (e.g., maps and commercial imaging products).
On the other hand, open-source information is not limited to what can be found using the main search engines [21]. In fact, performing a search on any engine produces as a result massive sources of information, which are far from being the only sources given. On the web, there are multiple open-source for different types of searches: videos, images, texts, etc.

Tools for OSINT
As for tools or applications that allow open-source intelligence, there are applications in different fields. Several specialized software tools include big data software, video analysis, text analysis, visualization tool, cybersecurity, web analysis, and social network analysis. On the web, there are tools for multiple purposes, with valuable information resources and uses in decision-making and measures in different fields, such as [2,3,[22][23][24]: Maltego (security and forensic investigation), Shodan (search engine for hackers), The Harvester (email and domain information), among others.

Availability of Information
Over the years, the improvement and specialization of the technologies and services available on the web have generated exponential growth in the amount of data available on the network. The increase in active users on social networks [25] is an example of this growth. Apart from promoting the strengthening and expansion of platforms such as TikTok or WhatsApp, social networks have generated large amounts of public data available on the web that can be queried through techniques, such as OSINT [26,27]. On the other hand, the maturity of open data in several countries has improved

Materials and Methods
A systematic mapping method was implemented to perform our research based on the construction of classifications and obtaining information on the existing knowledge on a specific topic [38]. This approach allowed us to analyze the source of information in order to identify findings both in the OSINT research dissemination and in the description of OSINT educational materials. Based on this approach, and following the method described by [38], the research design is explained below (Figure 1).

Definition of Research Questions
To develop this study, according to our approach, two macro-questions were raised to solve with the systematic mapping:

•
How has the academic-research and dissemination behavior provided by OSINT evolved? • How are the educational materials produced by OSINT being described?
Two strategies were developed to work on these questions.

of 25
A systematic mapping method was implemented to perform our research based on the construction of classifications and obtaining information on the existing knowledge on a specific topic [38]. This approach allowed us to analyze the source of information in order to identify findings both in the OSINT research dissemination and in the description of OSINT educational materials. Based on this approach, and following the method described by [38], the research design is explained below. (Figure 1).

Open-Sources Identification
Regarding the research questions, sources that allow identifying research dissemination and educational materials production in OSINT were selected: These sources were selected at random, taking into account their relevance in the academic and research world. The complete list of sources queried can be seen in Appendix A.

Research Criteria Definition
In order to perform the search processes, and considering the number of applications used in OSINT, the filtering of information in the selected sources was proposed in two stages. (a) For the search of resources, the following parameters were set: either in the title, summary or key words should contain the keys "OSINT" or "Open-Source Intelligence". (b) Subareas or sectors of OSINT application were identified in order to detail the findings on OSINT, which are provided by the OSINT body of knowledge:

Conduct the Search: Definition of Search Methods
In order to identify sources that provide open information about OSINT materials, the strategies described below were defined.
In the first step, and in order to identify sources, a general surface web to search videos, documents, sites, and any other OSINT materials (also called OSINT resources), was carried out by running a query on the Google search engine. For this search, a web scraping that allowed to filter the results by: (a) OSINT subareas, (b) sources that provide OSINT resources, and (c) type of OSINT material, was used. This was made in order to establish the digital format in which OSINT resources are published.
In the second step, and in order to execute a basic deep web search on OSINT resources, a source scan was carried out including sources of educational resources. This exploration was based on the following information retrieval strategies: a.
information retrieval using APIs or web scraping techniques. These sources were chosen because of the availability of their API, as well as for being key sources in terms of provision and dissemination of academic and scientific works; b.
information retrieval using web interfaces from sources. These sources were selected because of their focus on the dissemination of academic and scientific works. However, they do not have an API available for consumption; c.
retrieval of information from Massive Open Online Courses (MOOCs). In addition to academic and scientific distribution, it was relevant to explore the production of educational resources such as MOOCs in the OSINT area, since these types of courses enhance knowledge dissemination by individual organizations with the spirit of sharing and collaboration [39]. This exploration was performed manually since the sources did not have an accessible API or query services to exploit their content; d.
retrieval of information from other repositories. Subsequently, the exploration of sources specifically oriented to educational resources was executed in order to identify the existence of OSINT resources catalogued as Open Educational Resources (OER).

Screening of Sources: Design of the Search Strategy
For the design of the web scraping, and since Google has blocked robots, a user agent belonging to the Firefox browser that runs on Ubuntu Linux was used. This user agent renders web pages using the Gecko engine [40,41]. However, even in this case, after a certain number of pages, Google classifies these requests as those sent by a robot, so other strategies, such as virtual private network (VPN), had to be used. For this Agent, the following were used as libraries: (a) BeautifulSoup for reading text in HTML format as an object, (b) requests to the Google search engine, (c) Operating System (OS) for managing directory paths, (d) JavaScript Object Notation (JSON) for writing and reading files in JSON format, and (e) sys for handling controlled errors. The type of search used was "Term1" AND "Term2" (Figure 2), forcing the search engine to use the exact word or term, whereas the AND defines that both terms must be present in the results. In our case, term 1 was always the word OSINT, whereas the second term was iterated over the set of predefined areas corresponding to the identified OSINT sub-domains ( Figure 3).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 26 b. information retrieval using web interfaces from sources. These sources were selected because of their focus on the dissemination of academic and scientific works. However, they do not have an API available for consumption; c. retrieval of information from Massive Open Online Courses (MOOCs). In addition to academic and scientific distribution, it was relevant to explore the production of educational resources such as MOOCs in the OSINT area, since these types of courses enhance knowledge dissemination by individual organizations with the spirit of sharing and collaboration [39]. This exploration was performed manually since the sources did not have an accessible API or query services to exploit their content; d. retrieval of information from other repositories. Subsequently, the exploration of sources specifically oriented to educational resources was executed in order to identify the existence of OSINT resources catalogued as Open Educational Resources (OER).

Screening of Sources: Design of the Search Strategy
For the design of the web scraping, and since Google has blocked robots, a user agent belonging to the Firefox browser that runs on Ubuntu Linux was used. This user agent renders web pages using the Gecko engine [40,41]. However, even in this case, after a certain number of pages, Google classifies these requests as those sent by a robot, so other strategies, such as virtual private network (VPN), had to be used. For this Agent, the following were used as libraries: (a) BeautifulSoup for reading text in HTML format as an object, (b) requests to the Google search engine, (c) Operating System (OS) for managing directory paths, (d) JavaScript Object Notation (JSON) for writing and reading files in JSON format, and (e) sys for handling controlled errors. The type of search used was "Term1" AND "Term2" (Figure 2), forcing the search engine to use the exact word or term, whereas the AND defines that both terms must be present in the results. In our case, term 1 was always the word OSINT, whereas the second term was iterated over the set of predefined areas corresponding to the identified OSINT sub-domains ( Figure 3).  For the query of the source APIs, the APIs are accessed through their endpoints. In some cases, you have to use an API key to make authenticated requests to the platform. Each API may query data utilizing a GET request method. You have to carefully read the instructions to use each API. Below are a few examples of API endpoints: Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 26 b. information retrieval using web interfaces from sources. These sources were selected because of their focus on the dissemination of academic and scientific works. However, they do not have an API available for consumption; c. retrieval of information from Massive Open Online Courses (MOOCs). In addition to academic and scientific distribution, it was relevant to explore the production of educational resources such as MOOCs in the OSINT area, since these types of courses enhance knowledge dissemination by individual organizations with the spirit of sharing and collaboration [39]. This exploration was performed manually since the sources did not have an accessible API or query services to exploit their content; d. retrieval of information from other repositories. Subsequently, the exploration of sources specifically oriented to educational resources was executed in order to identify the existence of OSINT resources catalogued as Open Educational Resources (OER).

Screening of Sources: Design of the Search Strategy
For the design of the web scraping, and since Google has blocked robots, a user agent belonging to the Firefox browser that runs on Ubuntu Linux was used. This user agent renders web pages using the Gecko engine [40,41]. However, even in this case, after a certain number of pages, Google classifies these requests as those sent by a robot, so other strategies, such as virtual private network (VPN), had to be used. For this Agent, the following were used as libraries: (a) BeautifulSoup for reading text in HTML format as an object, (b) requests to the Google search engine, (c) Operating System (OS) for managing directory paths, (d) JavaScript Object Notation (JSON) for writing and reading files in JSON format, and (e) sys for handling controlled errors. The type of search used was "Term1" AND "Term2" (Figure 2), forcing the search engine to use the exact word or term, whereas the AND defines that both terms must be present in the results. In our case, term 1 was always the word OSINT, whereas the second term was iterated over the set of predefined areas corresponding to the identified OSINT sub-domains ( Figure 3).  For the query of the source APIs, the APIs are accessed through their endpoints. In some cases, you have to use an API key to make authenticated requests to the platform. Each API may query data utilizing a GET request method. You have to carefully read the instructions to use each API. Below are a few examples of API endpoints: For the query of the source APIs, the APIs are accessed through their endpoints. In some cases, you have to use an API key to make authenticated requests to the platform. Each API may query data utilizing a GET request method. You

Classification and Data Extracting
In this stage, three categories have been made ( Figure 4). The first category corresponds to the sources that do not have OSINT resources published. The second category corresponds to the sources that publish OSINT resources without metadata. Finally, the last category corresponds to the sources that publish OSINT resources with metadata.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 26 In this stage, three categories have been made ( Figure 4). The first category corresponds to the sources that do not have OSINT resources published. The second category corresponds to the sources that publish OSINT resources without metadata. Finally, the last category corresponds to the sources that publish OSINT resources with metadata.

Sources Without OSINT Resources
A total of 27.4% of the sources consulted do not present any kind of OSINT resources (Appendix B). This finding is derived from the specificity of some of the sources consulted (music, varieties and entertainment, etc.). However, the lack of presence of OSINT topics in electronic library projects, such as Latindex or SciELO may be due to factors such as the low development of OSINT in the countries which comprise these projects. Another factor to take into account could be the low interest or scope suggested by this type of means to disseminate the research results in OSINT.

OSINT Resources Identified without Metadata
Sources such as: (a) Booklick, which identifies 230 OSINT resources (journals and book chapters) redirected to the universities supporting the documents; as well as (b) Dialnet, which identifies nine OSINT resources (seven journal articles, two book articles), do not provide a metadata structure to classify the results found. From this, the need to have an adequate metadata structure that allows not only the description of resources, but also the provision of timely results to the search processes, can be inferred.

Sources without OSINT Resources
A total of 27.4% of the sources consulted do not present any kind of OSINT resources (Appendix B). This finding is derived from the specificity of some of the sources consulted (music, varieties and entertainment, etc.). However, the lack of presence of OSINT topics in electronic library projects, such as Latindex or SciELO may be due to factors such as the low development of OSINT in the countries which comprise these projects. Another factor to take into account could be the low interest or scope suggested by this type of means to disseminate the research results in OSINT.

OSINT Resources Identified without Metadata
Sources such as: (a) Booklick, which identifies 230 OSINT resources (journals and book chapters) redirected to the universities supporting the documents; as well as (b) Dialnet, which identifies nine OSINT resources (seven journal articles, two book articles), do not provide a metadata structure to classify the results found. From this, the need to have an adequate metadata structure that allows not only the description of resources, but also the provision of timely results to the search processes, can be inferred.
Nowadays, on the web, there are both paywall or subscription-based sources, and sources that offer publicly available information. However, it is essential to share metadata information to allow describing, enriching, finding, sharing, and reusing resources. We should publish the metadata necessary for resource identification regardless of the type of services used.

Results and Discussion
The results obtained from the exploitation of sources were analyzed using two different approaches: (a) the increase of available materials on OSINT specifically of a didactic or educational nature; and (b) the interest that OSINT arouses in the academic and research world.

Web Scraping Application Results
The following results were obtained from the Surface through the use of the web scraping: With regard to the OSINT subareas, the subarea with the most work corresponds to security analysis on topics such as: web Analysis (traffic analysis), video analysis (data, behavior, attitudes, etc.), link analysis (criminal activity, security analysis, etc.) ( Figure 5). To do this, products based on big data software are used, which allow the analysis of large amounts of disparate data, such as those provided by the aforementioned subareas. In general terms and seen from different perspectives, it is evident that security, as well as public and government environments, correspond to the areas of greatest interest in terms of the work and application of OSINT.

Results and Discussion
The results obtained from the exploitation of sources were analyzed using two different approaches: a) the increase of available materials on OSINT specifically of a didactic or educational nature; and b) the interest that OSINT arouses in the academic and research world.

Web Scraping Application Results
The following results were obtained from the Surface through the use of the web scraping: With regard to the OSINT subareas, the subarea with the most work corresponds to security analysis on topics such as: web Analysis (traffic analysis), video analysis (data, behavior, attitudes, etc.), link analysis (criminal activity, security analysis, etc.) ( Figure 5). To do this, products based on big data software are used, which allow the analysis of large amounts of disparate data, such as those provided by the aforementioned subareas. In general terms and seen from different perspectives, it is evident that security, as well as public and government environments, correspond to the areas of greatest interest in terms of the work and application of OSINT. It is important to consider that intelligence allows anticipating both opportunities and risks, the latter being critical factors for the survival of an organization or country. In general terms, security focuses on preventing risks and threats. Therefore, the combination of intelligence and security become key elements given, for example, in the different scenarios present in social, political, and economic environments worldwide.
With regard to the sources that provide OSINT resources, the sources with the highest participation rates correspond to the digitized book service offered by Google, as well as the ResearchGate Academic Collaborative Network. The latter carries out its searches in databases such as PubMed, CiteSeer, arXiv, and the National Aeronautics and Space Administration (NASA) Library, among others ( Figure 6). It is important to consider that intelligence allows anticipating both opportunities and risks, the latter being critical factors for the survival of an organization or country. In general terms, security focuses on preventing risks and threats. Therefore, the combination of intelligence and security become key elements given, for example, in the different scenarios present in social, political, and economic environments worldwide.
With regard to the sources that provide OSINT resources, the sources with the highest participation rates correspond to the digitized book service offered by Google, as well as the ResearchGate Academic Collaborative Network. The latter carries out its searches in databases such as PubMed, CiteSeer, arXiv, and the National Aeronautics and Space Administration (NASA) Library, among others ( Figure 6). Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 26 Regarding the type of material given in OSINT, the images and videos pertain to the most used OSINT formats, with 4447 and 1850 resources, respectively. Apart from that, below can also be found the Portable Document Format (PDF), blogs, and Wikis, with 113, 79, and 65 resources, respectively. These results show that in the vast majority of the sources queried, audio-visual media are used as one of the main strategies for the transmission of knowledge regarding the different OSINT topics. This type of behavior stems from the fact that in the process of managing open sources, the configuration and use of the tools are of great importance, for which the audio-visual material becomes the preferred tool, since involving skills, such as attention, application of learning, and understanding.

Metadata Reported by Sources
In relation to the queries made through the APIs or using web scraping techniques, a wide dissimilarity is observed in both the metadata published by the sources and the metadata that is allowed to be accessed through the services. The latter case is more restricted, as can be seen in the metadata results provided by the platforms for the description of their resources (Figure 7).
The limited amount of data provided by the platforms' services restrict different data comparisons, such as data models and controlled vocabularies used to describe their resources, domain tagging, for instance. Although each data source needs specific metadata to describe their resources, there are common metadata that all resources have to manage.
Regarding the description of resources, most sources provide a title, abstract, year, and author metadata, whereas few sources provide metadata of keyword, language, and document type. Those metadata are examples of the mandatory metadata that all data sources should be managed and universally spread.
One of the sources with the greatest problem for its consultation was YouTube, since it allows access to resources through a consultation quota, restricting the search processes. On the other hand, key assignment request responses for searches are not answered in a timely manner. Regarding the type of material given in OSINT, the images and videos pertain to the most used OSINT formats, with 4447 and 1850 resources, respectively. Apart from that, below can also be found the Portable Document Format (PDF), blogs, and Wikis, with 113, 79, and 65 resources, respectively. These results show that in the vast majority of the sources queried, audio-visual media are used as one of the main strategies for the transmission of knowledge regarding the different OSINT topics. This type of behavior stems from the fact that in the process of managing open sources, the configuration and use of the tools are of great importance, for which the audio-visual material becomes the preferred tool, since involving skills, such as attention, application of learning, and understanding.

Metadata Reported by Sources
In relation to the queries made through the APIs or using web scraping techniques, a wide dissimilarity is observed in both the metadata published by the sources and the metadata that is allowed to be accessed through the services. The latter case is more restricted, as can be seen in the metadata results provided by the platforms for the description of their resources (Figure 7).

Findings on the Identified OSINT Resources
Given the fact that the APIs of the sources consulted do not provide a uniform metadata scheme, the findings identified according to the metadata provided by the source(s) consulted are, therefore, The limited amount of data provided by the platforms' services restrict different data comparisons, such as data models and controlled vocabularies used to describe their resources, domain tagging, for instance. Although each data source needs specific metadata to describe their resources, there are common metadata that all resources have to manage.
Regarding the description of resources, most sources provide a title, abstract, year, and author metadata, whereas few sources provide metadata of keyword, language, and document type. Those metadata are examples of the mandatory metadata that all data sources should be managed and universally spread.
One of the sources with the greatest problem for its consultation was YouTube, since it allows access to resources through a consultation quota, restricting the search processes. On the other hand, key assignment request responses for searches are not answered in a timely manner.

Findings on the Identified OSINT Resources
Given the fact that the APIs of the sources consulted do not provide a uniform metadata scheme, the findings identified according to the metadata provided by the source(s) consulted are, therefore, presented below.

OSINT Resources Published
As for the production of OSINT resources in the last 10 years, Google Scholar shows a growing interest in publishing of these types of resources, whereas ScienceDirect tends to decline as a source of such resources ( Figure 8). As for Twitter, a rate of 15,750 tweets related to OSINT has been observed so far in 2020. Regarding YouTube, it went from 279 videos in 2019 to 1092 videos in 2020, so far.

Findings on the Identified OSINT Resources
Given the fact that the APIs of the sources consulted do not provide a uniform metadata scheme, the findings identified according to the metadata provided by the source(s) consulted are, therefore, presented below.

OSINT Resources Published
As for the production of OSINT resources in the last 10 years, Google Scholar shows a growing interest in publishing of these types of resources, whereas ScienceDirect tends to decline as a source of such resources ( Figure 8). As for Twitter, a rate of 15,750 tweets related to OSINT has been observed so far in 2020. Regarding YouTube, it went from 279 videos in 2019 to 1092 videos in 2020, so far. These results make it possible to show what was presented by [42], research work that describes that Google Scholar has the advantage of constantly finding the highest percentage of citations in all areas (93%-96%), well ahead of Scopus (35%-77%) and Web of Science (27%-73%). Additionally, Google Scholar found almost all Web of Science (95%) and Scopus (92%) citations, and most of their citations come from non-journal sources (48%-65%), including theses, books, conference papers, and unpublished materials. For these reason, Google Scholar maintains its position over the use of other These results make it possible to show what was presented by [42], research work that describes that Google Scholar has the advantage of constantly finding the highest percentage of citations in all areas (93%-96%), well ahead of Scopus (35%-77%) and Web of Science (27%-73%). Additionally, Google Scholar found almost all Web of Science (95%) and Scopus (92%) citations, and most of their citations come from non-journal sources (48%-65%), including theses, books, conference papers, and unpublished materials. For these reason, Google Scholar maintains its position over the use of other tools, considering the fact that it is a free tool. Regarding the OSINT resources published in the Database, Figure 9 presents the evolution in the period 2011-2019.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 26 tools, considering the fact that it is a free tool. Regarding the OSINT resources published in the Database, Figure 9 presents the evolution in the period 2011-2019. Among the Databases consulted, NewsBank is the fastest growing database in OSINT publication, followed by Scopus with 80.3% less of OSINT resources published in the last year. NewsBank consolidates information from newspapers, cable news, web editions, blogs, videos, broadcast transcripts, business magazines, periodicals, government documents, and other publications. This factor contributes to agglomerate different types of publications in areas, such as OSINT. Finally, each item in the NewsBank database has key metadata, such as headline, source, date, Lexile/readability, source type, author, standard title, and title as published.
All of the above show that: (a) Google Scholar has a greater preference for publishing OSINT resources in relation to specialized and paid databases such as Scopus or Web of Science; and (b) although NewsBank is a subscription service, it is consolidated as the central axis of the different OSINT publications made on the web.

OSINT Subareas Worked
With regard to the subtopics or areas on which the OSINT resources are developed (Figure 10), the accumulated sources (Google Scholar, Udemy, YouTube and ScienceDirect) allow cyber security to be identified as the area of greatest interest in the work about OSINT. In relation to the areas mentioned in the tweets published so far in 2020, cybersecurity (1350), information and securityinfosec (1350), security (900), and cybercrime (900) are the subareas with the highest participation rate in publications about OSINT on Twitter. Among the Databases consulted, NewsBank is the fastest growing database in OSINT publication, followed by Scopus with 80.3% less of OSINT resources published in the last year. NewsBank consolidates information from newspapers, cable news, web editions, blogs, videos, broadcast transcripts, business magazines, periodicals, government documents, and other publications. This factor contributes to agglomerate different types of publications in areas, such as OSINT. Finally, each item in the NewsBank database has key metadata, such as headline, source, date, Lexile/readability, source type, author, standard title, and title as published.
All of the above show that: (a) Google Scholar has a greater preference for publishing OSINT resources in relation to specialized and paid databases such as Scopus or Web of Science; and (b) although NewsBank is a subscription service, it is consolidated as the central axis of the different OSINT publications made on the web.

OSINT Subareas Worked
With regard to the subtopics or areas on which the OSINT resources are developed (Figure 10), the accumulated sources (Google Scholar, Udemy, YouTube and ScienceDirect) allow cyber security to be identified as the area of greatest interest in the work about OSINT. In relation to the areas mentioned in the tweets published so far in 2020, cybersecurity (1350), information and security-infosec (1350), security (900), and cybercrime (900) are the subareas with the highest participation rate in publications about OSINT on Twitter.
In general, it is evident that the concern for security-digital, physical, national, or organizational-has become one of the most worked domains in OSINT. Within this area, the use of link, video, text, and data analyses, etc., can be identified. In certain cases, this allows the detection and the establishment of some type of predictions about behavior, as well as people who may become potential threats to security.

OSINT Publication Languages
With regard to the academic works on OSINT published in Google Scholar and YouTube, English prevails with 619 resources published as the language with the highest dissemination in the publication of resources on OSINT, followed by Spanish (28), British English (19), Korean (15), and Italian (14), respectively. As for YouTube, the highest rate of publication languages of OSINT resources pertain to English (532) and Italian (112), respectively. The aforementioned leads to the conclusion that English is consolidated as the universal language for the transfer of knowledge regarding OSINT, followed by Italian and Spanish. In general, it is evident that the concern for security-digital, physical, national, or organizational-has become one of the most worked domains in OSINT. Within this area, the use of link, video, text, and data analyses, etc., can be identified. In certain cases, this allows the detection and the establishment of some type of predictions about behavior, as well as people who may become potential threats to security.

OSINT Publication Languages
With regard to the academic works on OSINT published in Google Scholar and YouTube, English prevails with 619 resources published as the language with the highest dissemination in the publication of resources on OSINT, followed by Spanish (28), British English (19), Korean (15), and Italian (14), respectively. As for YouTube, the highest rate of publication languages of OSINT resources pertain to English (532) and Italian (112), respectively. The aforementioned leads to the conclusion that English is consolidated as the universal language for the transfer of knowledge regarding OSINT, followed by Italian and Spanish.

Country of Origin of OSINT Publications
The databases that report the country of origin of the published resources (NewsBank, Scopus and Elton B. Stephens Company -EBSCO), identify the United States as the country with the largest number of publications on OSINT, for which its publication focus is the cable news, provided by NewsBank ( Figure 11).

Country of Origin of OSINT Publications
The databases that report the country of origin of the published resources (NewsBank, Scopus and Elton B. Stephens Company -EBSCO), identify the United States as the country with the largest number of publications on OSINT, for which its publication focus is the cable news, provided by NewsBank ( Figure 11).  On the other hand, the lack of participation in OSINT publications from Latin American, Central, and East Asian countries (except Japan) is observed in the same queried sources.

Types of Publications of OSINT Resources
Within the databases consulted (NewsBank, Oxford University Press, Sage Journals, Sage Knowledge, ProQuest, Springer, Scopus, Web of Science, EBSCO, and Directory of Open Access Journals-DOAJ), cable news-provided by NewsBank and the research articles-provided by 8 of the 10 databases, as well as book chapters-pertain to the most published OSINT resource types (Figure 12). On the other hand, the lack of participation in OSINT publications from Latin American, Central, and East Asian countries (except Japan) is observed in the same queried sources.

Types of Publications of OSINT Resources
Within the databases consulted (NewsBank, Oxford University Press, Sage Journals, Sage Knowledge, ProQuest, Springer, Scopus, Web of Science, EBSCO, and Directory of Open Access Journals -DOAJ), cable news-provided by NewsBank and the research articles-provided by 8 of the 10 databases, as well as book chapters-pertain to the most published OSINT resource types ( Figure 12).

Scope of Journals and Conferences in Which OSINT Is Published
Regarding the scope of journals and conferences, the metadata provided by three data sources (Taylor & Francis Group, The Institute of Electrical and Electronics Engineers-IEEE Xplore, and Oxford University Press) were reviewed. In general terms, the scopes with the greatest number of resources pertain to intelligence (100), security (77), social networks (12), science (5), computing (4), economy (3), and medicine (1). Most of the scopes are oriented to issues and challenges that must be addressed by both government and private institutions, especially when making contemporary decisions and policies related to intelligence and security.

General Areas in Which OSINT Is Published
For the review of general categories or subareas in which OSINT resources are published, the metadata provided by seven different databases (Taylor & Francis Group, Sage Journals, Sage Knowledge, Springer, Scopus, Web of Science and Redalyc) was reviewed. According to this review, the Taylor & Francis Group, Springer, Scopus, and Web of Science databases have greater coverage in subareas of OSINT resource publishing. On the other hand, Scopus has the largest number of resources (360) published under different OSINT subareas. Figure 13 identifies the areas of Computer Science and Politics as well as International Relations as those with the main bibliographic production, showing that OSINT is based on two macro scenarios: (a) in the systemic study to describe and transform information using the application of computer systems; and (b) in its use and application in a global context, considering the existing complex dynamics.
However, given the current conditions of increasing security and defense problems, as well as the usefulness that OSINT has proven to tackle these, as can be seen in [43], which is a European Union Agency for Law Enforcement Training, it is not ruled out that many processes performed in other areas are not fully documented publicly.
in subareas of OSINT resource publishing. On the other hand, Scopus has the largest number of resources (360) published under different OSINT subareas. Figure 13 identifies the areas of Computer Science and Politics as well as International Relations as those with the main bibliographic production, showing that OSINT is based on two macro scenarios: (a) in the systemic study to describe and transform information using the application of computer systems; and (b) in its use and application in a global context, considering the existing complex dynamics. However, given the current conditions of increasing security and defense problems, as well as the usefulness that OSINT has proven to tackle these, as can be seen in [43], which is a European Union Agency for Law Enforcement Training, it is not ruled out that many processes performed in other areas are not fully documented publicly.

Specific Subareas in Which OSINT Has Been Published
In terms of specific OSINT publication subareas, data protection and management, as well as security and defense ( Figure 14) are identified as the main areas of work in OSINT, given that they comprise two major work fronts for performing political, military, scientific, and sociological intelligence, among others. As for the tweets registered in specific sub-topics, there are 450 tweets registered under the dark web and military subareas. The above confirms the impact that OSINT has had in the security area, extending this to security in data management, as well as having a defense approach used against threats, either internal or external.

Educational Resources and MOOCs OSINT
With regard to the "educational" resources available on OSINT, search engine queries were made with combinations of key words such as (OSINT) (Open source intelligence) and (OER) (Open Educational Resources) (training), considering the use of operators to refine the search, such as those defined in [10]. For each combination of keys, groups of responses that exceed 100,000 results in each query are obtained. In broad terms, these can be identified as: blogs, certifications, courses, tools, projects, videos, podcasts, books, etc., such as: In terms of specific OSINT publication subareas, data protection and management, as well as security and defense ( Figure 14) are identified as the main areas of work in OSINT, given that they comprise two major work fronts for performing political, military, scientific, and sociological intelligence, among others. As for the tweets registered in specific sub-topics, there are 450 tweets registered under the dark web and military subareas. The above confirms the impact that OSINT has had in the security area, extending this to security in data management, as well as having a defense approach used against threats, either internal or external.

Educational Resources and MOOCs OSINT
With regard to the "educational" resources available on OSINT, search engine queries were made with combinations of key words such as (OSINT) (Open source intelligence) and (OER) (Open Educational Resources) (training), considering the use of operators to refine the search, such as those defined in [10]. For each combination of keys, groups of responses that exceed 100,000 results in each query are obtained. In broad terms, these can be identified as: blogs, certifications, courses, tools, projects, videos, podcasts, books, etc., such as: However, even though no OSINT resources are clearly identified as "Educational Resources" within the exploration carried out, those which were identified are susceptible to be used in training processes as support material.
With regard to search engines and MOOC repositories, the query raised about OSINT resources yielded the results shown in Table 1.

Repository
MOOCs on OSINT The Open University 0 Iversity 0 However, even though no OSINT resources are clearly identified as "Educational Resources" within the exploration carried out, those which were identified are susceptible to be used in training processes as support material.
With regard to search engines and MOOC repositories, the query raised about OSINT resources yielded the results shown in Table 1. As seen in these results, most MOOCs are redirected to Udemy. MOOCs published in the Udemy platform, are focused on applications, tools, and techniques of OSINT, and Cybersecurity and investigation. On the other hand, Udemy uses a simple set of metadata information to describe their resources (title, author, date, language, description, requirements). Finally, all of these resources are paid courses. In such query, MOOC search engines were also reviewed, giving the following results (Table 2).
Although most MOOCs are described using metadata such as platform, provider, effort, length, language, credentials, and Uniform Resource Identifier (URI); some of these platforms do not manage metadata information that allows describing essential data, such as their educational purpose and skills, or a complete provenance and contributor schemas. That metadata information is so important in the educational domain in order to classify adequately the resources. Generalizing the results obtained in this section, it is therefore evident that: a.
OSINT material, such as videos, images, etc. has been created and these resources are used in different sites with different purposes as well; b.
although these materials do not have all the characteristics of educational resources, they can be used in this context, thus complementing aspects of a pedagogical and didactic nature; c.
there is a very low rate of resources that are framed within the educational and MOOC context, in relation to the number of courses, certifications and trainings found on the web about OSINT; d.
as for the MOOCs obtained from the queries, it can be identified that most of them are centralized in Udemy and are oriented to cyber security and defense; e. a common problem identified in this research is metadata. According to [44][45][46], metadata sounds like one of the most boring things for people; for that reason, they don't care if resources are correctly described and written. This situation generates little information about resources; f.
Most OSINT "educational" resources published on the web are focused on how the topic is defined, what kind of tools and gadgets you have to use, how you have to set up these tools, and how you can use these tools in a specific field. These topics correspond to the firsts stages in the OSINT process. However, topics referring to how to analyses data obtained from OSINT tools, and making decisions based on these analyses, do not have enough resources published on the web.
Additionally, from the results obtained, it is, therefore, evident to generate educational resources and MOOCs complementing the training processes in which the OSINT topic is introduced. However, it is important to document the existing general resources with a metadata structure defined for this purpose, which provides the necessary elements to make them accessible and reusable in the educational environment.

Other Queried Repositories
Among the sources consulted, those shown in Table 3 present OSINT resources. However, they do not offer APIs or any type of complementary metadata.
This query identifies Course Hero as a potential repository of OSINT resources classified into courses, study documents, study guides, videos, questionnaires and troubleshooting books. However, Course Hero is a proprietary repository in which material from schools and universities around the world can be found. Therefore, this does not limit it to be an open-source tool specifically.

Mapping the OSINT Application Fields
Briefly, concerning the approaches developed on OSINT, the following results are generalized: • regarding the literature review, applications of OSINT in several knowledge domains are identified.
One of the most worked areas focuses on security and cybercrime. Research such as that carried out by [47][48][49] are examples of this type of application; • according to the outcomes collected through web scraping, research in OSINT has been advancing in fields such as security analysis, focusing on topics such as web analysis, video analysis, and link analysis; • as for OSINT publications, a higher level of publications was identified on topics related to security, data management, and defense; • concerning the educational resources, Udemy and Course Hero are the major MOOC providers. The MOOCs production is focusing on OSINT life-cycle, cybersecurity, hacking, terrorism, among other fields; • regarding OSINT trends, the security analytics segment is expected to garner a significant share for 2026. Numerous benefits provided by security analytics have been fueling the growth of this market [48].
These results show that both scientific publications and resource production are aligned with OSINT market trends. However, in addition to the need to convert OSINT materials into educational resources, there is a need to focus on transforming OSINT into a robust and self-managed solution [10].

Ratio of Total Resources vs. OSINT Resources
As shown in Figures 15 and 16, the participation of OSINT resources in the resource repositories does not represent 1% of their publications. In such figures, it can be seen that even in repositories with a high level of publication of resources such as NewsBank, only 0.00012% of OSINT resources are published. The repository with the highest number of published OSINT resources pertain to Sage Knowledge with 0.14% of its resources.
This shows that, although OSINT has been generating a great impact on subjects such as security and defense, the scope of its publications is not yet robust enough on the academic and scientific media. This situation can be derived, for example, from the criticality or confidentiality of information handled, from the domains in which they are applied (terrorist profiling, military objectives, etc.), or from the caution of experiences with regard to their application and use, which prevents making the process executed and the results obtained public.

Ratio of Total Resources vs. OSINT Resources
As shown in Figures 15 and 16, the participation of OSINT resources in the resource repositories does not represent 1% of their publications. In such figures, it can be seen that even in repositories with a high level of publication of resources such as NewsBank, only 0.00012% of OSINT resources are published. The repository with the highest number of published OSINT resources pertain to Sage Knowledge with 0.14% of its resources.  This shows that, although OSINT has been generating a great impact on subjects such as security and defense, the scope of its publications is not yet robust enough on the academic and scientific As shown in Figures 15 and 16, the participation of OSINT resources in the resource repositories does not represent 1% of their publications. In such figures, it can be seen that even in repositories with a high level of publication of resources such as NewsBank, only 0.00012% of OSINT resources are published. The repository with the highest number of published OSINT resources pertain to Sage Knowledge with 0.14% of its resources.  This shows that, although OSINT has been generating a great impact on subjects such as security and defense, the scope of its publications is not yet robust enough on the academic and scientific Overall, the results of this study show an alignment with the factors contributing to the growth of OSINT industry [50][51][52]. Thanks to the accumulation of information currently increasing as well as the opening of a growing number of sources, OSINT has been able to make almost real time analysis, presenting a remarkable development in the areas of defense, national security, and public security, both in virtual and physical environments. On the other hand, regarding the limitations foreseen for the growth of the open-source intelligence market, the lack of investments and experience in open source analysis can be clearly identified, along with data quality problems. Finally, it can also be recognized that in the global market, North America has the highest share as far as OSINT products is concerned, and its growth rate is based on the demand for cloud-based security solutions. These OSINT market reports trends-projected to 2026-consolidate not only the results obtained in this research about the areas of growth and approach that has been given to it, but also its growth projections, thanks to the benefits that this technology offers for the security and defense processes.
However, despite the growing spread of OSINT as a training topic, as well as the existence of blogs and sites dedicated to explaining both its application process, and the use of the tools supporting this technology, along with the existence of videos, talks about its uses, different tools and approaches to its professionalization, the review carried out does not identify any studies or statistics that could allow analyzing the production and growth of OSINT resources.

Conclusions
This study shows that the interest in OSINT has been growing, taking into account its benefits for performing intelligence processes that allow to increasingly generate reactions to risks and threats in real time, by making the most of the amount of data and sources available on the web. This type of intelligence has been playing a preponderant role in sensitive areas, facing the existing world conditions such as security and defense in their different modalities and strategies. As a result, it has projected itself for 2026 as one of the markets with an increasing flow of money for its use. Similarly, in addition to the growth of the existing ones, the application of OSINT is projected as future strategies to continue supporting processes of analysis of markets, services and products, along with other intelligence schemes.
Regarding the academic scientific dissemination of OSINT resources, although it is currently growing, it does not reflect a high degree of participation within the repositories and databases, being those non-profit and free use sources the ones with the greatest presence of OSINT resources. On the other hand, both the publications and the Latin American repositories present an even lower production and publication rate, which makes it possible to visualize a field of action available to address this type of technology in private or public projects.
As for the shortcomings that the diffusion of resources of this type of technology allows to identify, the following can be considered: • the need to document the metadata of existing resources in a more thorough way, which allows for more timely information for those seeking to exploit such resources; • given the growing foray of OSINT in training processes either as a subject or as a global body of knowledge, it is therefore necessary to design educational resources that provide not only a clear and timely metadata structure but also improve their accessibility and reuse, in addition to being didactically and pedagogically contextualized; • along with creating these educational resources, it is also necessary to index such OSINT resources in educational resource repositories, apart from being able to have an open license for them since the few discussing the subject are exclusive and linked to training processes for a fee; • as for the information that people publish, often without knowledge of its public availability, it is, therefore, essential to propose strategies that allow people to become aware of the sensitivity of the information they publish; • open data policies are presented as another challenge to be addressed, given the tendency to open data to meet requirements, rather than having a real approach to openness; • given the increasing openness of public data, it is necessary to design applications that allow the ordinary citizen to be able to exploit and analyze the information coming from this type of sources; • finally, and considering the growing availability of both information and tools to capture such information, it is important to train people who interact with OSINT in how to analyze data and provide better results for decision-making processes. Similarly, it is vital to strengthen competencies on how to refine the quality of information coming from multiple open sources in order to provide better quality analysis and intelligence.
In general terms, a promising future for OSINT can be considered in different fields of action. However, a greater focus is required in the last stages of the open-source intelligence process in order to be able to use this knowledge in more assertive ways and with better quality. As future work, the design of a data model that allows to complement the description of educational resources can be projected. This could contribute to the accessibility and the reuse of the same resources with the purpose of adding value in the results and intelligence processes carried out by this type of technology in domains such as education.

Glossary
Big data it refers to extremely large data sets that require a scalable architecture for efficient storage, manipulation, and analysis (University of Wisconsin). Deep web search this search is not indexed by popular search engines. Users require login or credentials to discover and access a specific service. Intelligence process of collecting, processing, and analyzing raw data from different sources and transform them into information to address a need, make decisions on, or to be used in a context. Intelligence processes also called the Intelligence Cycle. It refers to the process of tasking, collecting, processing, analyzing, and disseminating intelligence.