1. Introduction
In today’s digital age, cyber-threats pose a constant challenge due to the increasing connectivity and complexity of digital systems. The move to digital technologies has brought about incredible advancements, but it has also made digital systems more complex, increasing risks and creating greater uncertainty in cyber-space [
1,
2]. Moreover, the increasing number and sophistication of cyber-attacks, such as advanced persistent threats (APTs), pose significant risks, including data theft, financial losses, and reputational damage [
3,
4]. The frequency and complexity of cyber-attacks require a strategic and collaborative response, with cybercrime threat intelligence (CTI) at its core. CTI enables organizations to shift from reactive to proactive measures and make data-driven security decisions to effectively combat threat actors.
The Arab world faces major challenges due to the lack of local IT initiative platforms, which leads to reliance on costly and less effective external solutions. Our approach combines security information and event management (SIEM), the MISP, and geodistributed honeypots to scientifically address these issues. SIEM systems handle the integration and analysis of large amounts of data, using advanced analytics to uncover patterns and potential threats, while the Malware Information Sharing Platform (MISP) facilitates the sharing and accurate classification of threat information between entities.
The spread of cyber-threats does not respect borders. Thus, our research takes on a global perspective, with a dedicated focus on addressing the unique cyber-security needs of the Arab world. Arab countries are becoming prime targets for ransomware gangs such as Lockbit [
5]. The following describes the objectives and contributions of this research towards a CTI sharing platform tailored to the cyber-security needs of the Arab world.
Geodistributed honeypots improve real-time threat detection and provide region-specific insights, making cyber-security measures more relevant and effective. This combination improves predictive accuracy, reduces costs, and enhances regional cooperation, providing a powerful solution to efficiently manage big data and real-time threats, significantly enhancing cyber-security in the Arab region.
This study addresses the critical need for a CTI engagement platform tailored to meet the unique cyber-security needs of the Arab world. Although there are many CTI platforms worldwide, there is a clear absence of a dedicated CTI engagement solution for the Arab digital landscape. This research presents an MISP-based CTI engagement platform, which has been specifically adapted to improve the cyber-resilience of Arab organizations. By providing a secure ecosystem for the sharing of actionable threat information, this platform aims to enhance collaboration and improve security awareness, enabling proactive defense against evolving cyber-threats.
The novelty of this work lies in its focus on the Arab world, a region often overlooked in global cyber-security initiatives. By integrating honeypot systems, open-source intelligence, and machine learning techniques, the proposed platform provides a cost-effective solution to create and share Indicators of Compromise (IoC). This research contributes significantly to filling the critical gap in regional cyber-security infrastructure and strengthening collective defense against cyber-threats.
This research makes a distinctive contribution by addressing the unique cyber-security needs of the Arab world. Currently, there is a conspicuous absence of a dedicated CTI sharing platform tailored to the nuances of the Arab digital landscape. The MISP-based CTI sharing platform aspires to fill this void by introducing a solution specifically designed to meet the challenges and requirements of Arab organizations. This contribution is envisioned to significantly enhance the collective cyber-resilience of Arab entities, fostering a collaborative defense against cyber-threats. In essence, this research aims to bridge a critical gap in the cyber-security infrastructure of the Arab world, establishing a foundation for effective threat intelligence sharing and collaboration. By leveraging the MISP platform, the proposed CTI sharing platform is poised to not only enhance the security posture of individual organizations but also contribute to the collective defense against cyber-threats on a regional scale. The contributions of this paper include the following:
We proposed a scheme for organizations to collect information and build a system using open-source tools without using expensive commercial CTI systems provided by cyber-security companies;
We analyzed 1,013,033 pieces of data collected from honeypot and 6877 pieces of data from open-source intelligence (OSINT) sources to identify patterns and trends that indicate potential threats;
The platform monitors multiple threat intelligence sources to enable security system integration to detect and analyze potential cyber-threats that are specific to the Arab region;
We applied machine learning based on data collected from the platform that we built, demonstrating an accuracy of 99.79% for country, IP, and TTPs.
The remainder of this paper is organized as follows. In
Section 2, we conduct a review of CTI-related research, specifically exploring projects that use the MISP platform in a similar way to our study and examining the results obtained. In
Section 3, we explain our platform architecture, which consists of real-time threat collection from honeypot and OSINT data using two types of collection methods (manual feed and automatic module feed) and a method for checking accuracy. We investigate the CTI analysis and statistics, assessing the nature of the research and the meaningful results generated. In
Section 4, we describe the implementation of the platform developed in our research. In
Section 5, we extend our focus to understanding the perspectives on CTI sharing, and we thoroughly examine the relevant research and results in this regard. We analyze the result and provide a brief summary of the statistics of the collected IoC from OSINT and honeypot. In
Section 6, we discuss this study’s limitations and directions for further research. Finally, in
Section 7, we present our conclusions.
2. Literature Review
The need for effective CTI engagement platforms is well-documented in the cyber-security literature. Many researchers have emphasized the importance of collaboratively sharing threat information to enhance incident response capabilities.
Cynthia Wagner et al. [
6] stated that the IT community is faced with various types of incidents, with new threats appearing every day, and that it is almost impossible to respond to these security incidents individually. Thus, sharing information about threats across communities has become a key element of incident response to identify attackers, and trusted intelligence resources that provide reliable information can be found within the IT community, the broader intelligence community, or fraud detection groups. This is essential, and, in this regard, the authors presented the MISP and the Threat Sharing Project.
According to M. Mutemwa et al. [
7], in developing countries such as South Africa, security and defense role players often lack the necessary capabilities to effectively defend their national cyber-space against fast-moving and persistent threats. The authors state that addressing this challenge requires improved security solutions and increased collaboration within the cyber-domain. They emphasized the importance of information sharing as a crucial element in detecting, defending against, and responding to constantly evolving cyber-threats and attacks. To address this need, the authors proposed a conceptual CTI sharing model and platform. This model aims to stimulate and enable various stakeholders to seamlessly and collaboratively aggregate, analyze, and share contextually actionable cyber-threat information in a timely manner.
Abdullahi et al. [
8] conducted a systematic review of the literature on the AI methods used to detect cyber-security attacks in IoT environments, analyzing 80 studies from 2016 to 2021. In the literature review, they found that deep learning and machine learning techniques, especially SVM and RF, are very effective in solving security problems. They also aim to explore advanced methods such as XGBoost, NN, and RNN to improve detection accuracy. Kattamuri et al. [
9] used 51,409 samples, including the SOMLAP dataset, for static malware detection for cyber-threat intelligence. They used machine learning-enhanced colony optimization algorithms, Ant Colony Optimization (ACO), Cuckoo Search Optimization (CSO), and Gray Wolf Optimization (GWO), and they achieved an accuracy of 99.37%, with 12 features optimized using ACO.
Maryam et al. [
10] discussed the implementation and advantages of the MISP platform for the sharing of CTI. Their study emphasized the collaborative features of the platform, which facilitate the effective generation and sharing of IoC within the cyber-security community. They highlighted several improvements, such as better integration with other security systems and enhanced data analytics, that enhance threat detection and response capabilities.
Sakellariou et al. [
11] defined the core concepts of the CTI framework and presented an eight-layer CTI reference model for advanced system design. The authors validated the proposed model through three case studies and created a CTI reference architecture based on them. Ramsdale et al. [
12] examined standardized shared environments for cyber-threat intelligence, such as STIX, TAXII, and CybOX, and they evaluated their implementation. The authors highlighted various challenges that arise when analyzing threat feeds, identifying data types, and aggregating and sharing data. The study concluded that, although standardized shared environments are widely known, real-world adoption is low, and many providers often prefer customized or simple formats.
Melo e Silva et al. [
13] showed that the cyber-security landscape has fundamentally changed over the past few years, and organizations are encouraged to develop the ability to respond to incidents in real time using sophisticated threat intelligence platforms. However, as the field is growing rapidly, the concept of CTI today lacks a consistent definition, and a heterogeneous market has emerged that includes a variety of systems and tools with different capabilities and goals, creating a need for threat intelligence standards. Therefore, the authors presented a comprehensive evaluation methodology for intelligence platforms.
Borce Stojkovski et al. [
14] studied MISP mixed-method user experience investigations. An effective incident response in the realm of security relies on standardized CTI as vital threat information. Their study used a comprehensive approach to security incident response, presenting 18 key concepts that support the evaluation and establishment of standardized approaches. By analyzing six incident response formats, the authors identified their structural elements, highlighted characteristics, exposed format deficiencies, and showed how key concepts aid in selecting the appropriate format for specific use cases. In addition, the authors highlighted an ongoing research task aiming to fully harness the potential of incident response. CTI sharing platforms are becoming essential tools for collaborative and cooperative cyber-security; however, the focus is often on the technical aspects, incentives, or implications associated with CTI sharing rather than on examining the challenges experienced by platform users. MISP is an open-source CTI sharing platform used by more than 6000 organizations worldwide, and, as a technologically advanced CTI sharing platform, it aims to accommodate a diverse range of security information workers with distinct needs and goals.
As mentioned above, many researchers have emphasized the need for a CTI sharing platform, including the MISP platform, and they have developed systems that can be used by independent communities. In this context, this study aimed to develop a CTI sharing platform for the Arab world, as organizations in the Arab world do not have a CTI sharing platform. In addition to the research on building a CTI sharing platform, much research has been conducted on CTI collection and analysis sharing methods, and many notable research results have been published.
Abu [
15] reviewed the existing research related to CTI, addressing the most basic question of what CTI is by comparing existing definitions to identify commonalities or inconsistencies. They argued that more research is needed to define CTI, as both organizations and vendors lack a complete understanding of what information is considered CTI. They explained that research institutes such as the Financial Services Information Sharing and Analysis Center (FS-ISAC) and MITRE Corporation are developing standard formats for intelligence sharing, such as Structured Threat Information eXpression (STIX) and Trusted Automated eXchange of Intelligence Information (TAXII).
Daniel Schlette et al. [
16] studied CTI as threat information used for security purposes, requiring standardization in incident response. The authors reviewed a broader security incident response perspective, presenting 18 key concepts and analyzing 6 incident response formats. They identified format defects and explained how to choose the appropriate format using key concepts. The survey results consistently focused on incident response measures in all formats, with playbooks indicating procedures. Various use cases allowed organizations to combine formats. The authors also discussed ongoing research to maximize incident response potential.
In addition to the various studies on the CTI sharing platform, a lot of research is currently being conducted on intelligence analysis techniques and results. Therefore, we not only combine the CTI sharing platform together with the MISP platform for the Arab world, but also establish procedures and an environment that enable the collection and analysis of actual attack data and the provision of unique intelligence information.
Table 1 highlights the key features and limitations of existing CTI sharing platforms, demonstrating the novelty and significance of the proposed platform in addressing regional cyber-security needs.
Despite advances in CTI sharing platforms, several limitations remain. Many existing platforms focus on specific regions or sectors, often neglecting the unique needs of the Arab world. Additionally, a lack of standardized methods for integrating diverse threat intelligence sources leads to fragmented and inconsistent threat data. This study addresses these limitations by proposing an MISP-based CTI sharing platform, which is specifically adapted for the Arab region, integrating open-source honeypot and intelligence systems to create and share IoC.
3. Methodology
Abu outlined the process of generating CTI information, systematically categorizing it into five well-known steps: planning and direction, data collection, processing, analysis and production, and dissemination [
17,
18,
19]. In alignment with this, our study also follows a five-step methodology; however, we propose the incorporation of an additional classification step specifically tailored for collecting IoC that are more suited to our unique research model. The CTI data flow lifecycle for fetched IoC typically involves five stages:
Research and Planning: Establish a repository of free IoC for cybercrime threat intelligence in the Arab World. Conduct initial research to comprehend the project scope and objectives, identifying potential data sources and determining specific project requirements;
Data Collection: Develop mechanisms to collect data, leveraging international OSINT IoC and security alerts from honeypot. Utilize advanced methods to ensure a comprehensive and diverse collection of relevant cyber-threat information;
Analysis: Process and analyze the collected data using appropriate techniques, such as data mining. Identify patterns, anomalies, and trends within the data to gain deeper insights into emerging cyber-threats in the Arab world;
Classification: Establish a database of OSINT IoC that require classification to derive the most effective IoC for public sharing. Implement an accurate classification process to enhance the quality and relevance of the shared IoC;
Dissemination: Publish the acquired IoC for free, aiming to become the leading free IoC provider in the Arab World. Present IoC with attractive graphics and in a user-friendly format to empower users to easily and effectively utilize IoC in order to prevent exposure to the risks of cyber-attacks.
Our entire design of the system architecture is described in
Figure 1. We use two main methods to collect CTI information. First, we set up honeypot systems to gather data on real and upcoming cyber-attacks. Second, we use OSINT, which involves white papers and security alert reports created by trusted organizations. We also gather IoC shared by verified organizations through social media platforms. Additionally, MISP’s Feed Provider function is used to collect and store enhanced CTI information. In this research, we provide a brief explanation of how to generate and share CTI data by obtaining direct threat information using the honeypot system. We also learn about resources that can collect OSINT data and explain our two methods (manual collection and module collection) of collecting CTI information, including IoC. In addition, we analyze and explain statistics on data collected in a recent month, as well as statistical information such as attack type, threat actor, and the threat actor’s tactics.
3.1. Collecting Security Alerts from Honeypot
We implement 6 honeypot systems, as shown in
Figure 2. These systems are on the cloud and run a virtual web server, which could attract potential attackers. The cloud server is physically located in the MENA region; however, presently, the data that we use may have some bias, as the attacks target the United States and Saudi Arabia due to the relocation of the honeypot server from the United States to the MENA region. Cyber-attacks can occur simultaneously across different regions, and similar attack patterns are widespread worldwide. Despite this, our future plan involves installing physical honeypots in seven locations worldwide to ensure even data collection. By gathering information from diverse global sources, we aim to identify unbiased and accurate trends, providing more precise CTI insights.
To ensure accurate security alerts, it is crucial to maintain a robust security policy. While most Security Information Event Monitoring (SIEM) solutions generate a multitude of security alerts, ranging from mild to severe attacks, accurately identifying and tracking meaningful alerts pose a distinct challenge compared to the sheer volume of generated alerts. In this study, we formulate a rule policy for the Wazuh system, which gathers security alerts from the honeypot system, with the aim of filtering data of significance such as CTI. This initiative helps reduce extraneous data and allows for the monitoring of attack trends.
3.2. Collecting OSINT IoC and Parameters
We implemented MISP systems and the collection module, as shown in
Figure 3. The fight against cybercrime is becoming more tangled every day, which demands collaborative efforts. In pursuit of shared goals, several OSINT providers have emerged to combat cybercrime. There are various potential sources for gathering IoC data, and they can be categorized as shown in
Table 2. The collection parameters are listed in
Table 3.
To gather IoC data, we employed two distinct methodologies, classifying data collection into module collection and manual collection.
Table 4 presents a comparison of the key steps involved in incorporating IoC into an MISP system, together with the level of automation associated with each type of feed. It is evident that manual feeds demand the highest degree of manual effort, whereas module collection feeds necessitate the least manual intervention.
4. Implementation
4.1. Honeypot System Deployment
We implement a honeypot server, as shown in
Table 5. The honeypot server is located primarily in the MENA region, encompassing the GCC area and Africa. By focusing on this specific region, we gather targeted information that is relevant to our geographical location. This approach enables us to obtain more precise intelligence on cyber-threats affecting the MENA region, which are distinct from those in other regions.
The detection rules in the Wazuh system are assigned security levels ranging from 0 to 15, predefined based on the threat risk. Levels 0 to 5 represent a risk from 0 to relatively low, while levels 6 to 10 indicate an elevated risk, requiring a response. Levels 11 to 15 signify a high risk of attack, necessitating active response from the security analysis team through additional analyses. In this study, data collection commences from level 6 to discern genuine trends in security attacks.
4.2. OSINT IoC Collection
4.2.1. Manual Collection
Gathering information from open sources using various collection methods can serve multiple purposes, including competitive intelligence, research, and security assessments. The following are some common methods for collecting information from open sources:
Web Crawling: Software can be utilized for automated browsing and information collection from websites. This method is efficient for quickly gathering data from numerous websites, although it may not capture all relevant information;
Search Engines: Platforms such as Google and Bing are valuable for finding information on specific topics or entities, providing a quick and easy way to access publicly available information;
Social Media Monitoring: Platforms such as X (Twitter), Facebook, and LinkedIn offer valuable insights into individuals or organizations. Social media monitoring tools can track mentions, keywords, and hashtags that are related to specific topics or entities;
Public Records Requests: Requests can be made to government agencies for information related to specific topics or individuals. While time-consuming, this method can provide access to information not available through other sources;
Online Forums: Platforms such as Reddit and Quora offer insights into specific topics or industries, helping to identify emerging trends and issues;
News Aggregators: Services such as Google News and Feeds collect news articles related to specific topics or entities, aiding in tracking news and updates over time;
Data Scraping: Data can be extracted from web pages using software. This method is efficient for quickly collecting large amounts of structured data, though it may not be legal or ethical in all cases.
The methodologies that we use to collect OSINT data include search engines, public records, online forums, and news aggregators, among the methods previously mentioned. This comprehensive approach not only ensures the systematic collection and analysis of CTI information but also emphasizes the importance of providing valuable, freely accessible IoC to enhance cyber-threats across the Arab world. The data mainly collected in this project can be seen in
Table 6 and
Table 7.
To collect data, we joined and researched open-source intelligence communities. These communities provide various sites or tools where security experts share events and indicators that they have encountered. These platforms are very useful for searching for threat information and security alert reports.
AlienVault: AlienVault Open-Source Threat Exchange is a group-source cyber-security platform. It has more than 180,000 participants in 140 countries who share more than 19 million potential threats daily. Furthermore, after integration with this platform, we are directly alerted if there are any new attacks or IoC;
Google Dorks: The Google Dorks OSINT data-gathering method uses clever Google search queries with advanced arguments [
23].
4.2.2. Collection Module
Peter Amthor et al. stated that effective cyber-security management requires a timely and cost-effective response to all threat alerts [
24]. Using automated cyber-threat detection and incident response is an efficient approach to quickly address real threats. Thus, there is a need for automated tools for threat detection, such as threat intelligence sharing platforms and security policy control systems. These tools encompass various technologies, methods, and instruments that aim to respond to occurrences and events of threats. In our study, we implemented the automatic collection of OSINT data and are currently exploring methods to automatically incorporate the gathered IoC into security policies and equipment.
The process involved setting up an MISP instance and generating an MISP API key. Then, we created a Twitter developer account and obtained API credentials. Based on this, we developed methods by employing a Python script for integration using an API key that uses the sntwitter library to retrieve feeds from Twitter (X.com). In other words, data were transmitted to MISP through API, and the confirmation process for the retrieved data was implemented using a Python script. Overall, the data flow process in this script can be represented as follows:
Input node (imported modules);
Processing node (classify_iocs function);
Output node (MISP instance);
Transformation node (get_query_date_range function);
Processing node (Iterate through tweets and classify IoC);
Output node (MISP instance).
Algorithm 1 shows, in detail, the pseudocode that we used to collect the data from Twitter (X.com).
Algorithm 1 Class CyberThreatMonitor |
- 1:
Define class CyberThreatMonitor - 2:
Define method __init__ - 3:
Set twitter_api to method call setup_twitter_api - 4:
Set misp_api to new PyMISP instance with URL, key, and verify parameters - 5:
Define tags list with “#phishing”, “#malware”, “#infosec” “#cybersecurity”, “#ransomware”, “#APT”, “#zeroDay”, “#dataBreach”, “#hacking”, “#cybercrime” - 6:
Define method
setup_twitter_api - 7:
Create auth with Twitter consumer key and secret - 8:
Set access token and secret on auth - 9:
return new tweepy API instance with auth, set to wait on rate limit - 10:
Define method fetch_tweets with parameter query - 11:
Create a cursor with tweepy.Cursor to search tweets using API with query, tweet mode extended, and language English - 12:
return cursor items up to 100 - 13:
Define method extract_iocs with parameter tweet - 14:
if tweet has ’retweeted_status’ then - 15:
Use that text - 16:
else - 17:
Use tweet’s full text - 18:
end if - 19:
Extract URLs from text using iocextract with refang True - 20:
Extract IPs from text using iocextract - 21:
Extract SHA256 hashes from text using iocextract - 22:
Extract MD5 hashes from text using iocextract - 23:
return dictionary with lists of URLs, IPs, SHA256s, and MD5s - 24:
Define method report_to_misp with parameters iocs and tweet_info - 25:
for each ioc_type and iocs_list in iocs do - 26:
for each ioc in iocs_list do - 27:
Create a new MISP event with info “Twitter-based IOC alert” - 28:
Add attribute to event with ioc_type, ioc value, and a comment with tweet’s user - 29:
Submit event to MISP API - 30:
end for - 31:
end for - 32:
Define method
monitor_tweets - 33:
Join tags with “OR” and append “-filter:retweets -filter:replies” to form query - 34:
Set tweets to result of method call fetch_tweets with query - 35:
for each tweet in tweets do - 36:
Set iocs to result of method call extract_iocs with tweet - 37:
Set tweet_info to dictionary with user and date from tweet - 38:
Call method report_to_misp with iocs and tweet_info - 39:
end for
|
The results of the conversion of the data collected by the collection module are shown in
Figure 4.
Figure 5 shows an example of a phishing site detected by our platform.
5. Analysis
The platform focuses on optimizing threat response mechanisms to minimize the impact of cyber-incidents, and it promotes interoperability by ensuring compatibility with existing cyber-security infrastructures, encouraging widespread adoption in diverse organizational settings.
The gathered IoC are subsequently archived in a dedicated section within the database established explicitly for this research initiative. The overarching objective of these five steps is to methodically classify and share IoC, thereby augmenting our capacity to respond promptly and efficiently to the most pressing threats facing the Arab world. This strategic approach improves the overall cyber-security posture and resilience of the region.
Gong conducted research and found that CTI information can also be applied to security systems in internal IT and OT infrastructures, such as IoT (Internet of Things) and Supervisory Control and Data Acquisition (SCADA) networks. Furthermore, the performance of a security system depends on the accuracy of the data, and they provided data accuracy results for four CTI feeds using approximately 40,000 datasets [
25]. Likewise, there is a need to confirm the accuracy of the collected IoC. In general, there are three methods that can be used to check data accuracy—Cross-Verification, Check for Reliable Sources, and Evaluation of Data Quality and Consistency:
Cross-Verification: We meticulously compare the gathered OSINT information with data acquired from various independent sources. Using the MISP platform’s functionality, we establish connections when identical IoC are found across events in the standard data format. When identical IoC are identified, the information is considered more accurate, having been corroborated by multiple sources;
Check for Reliable Sources: We conduct thorough checks to ascertain the credibility of the information, verifying its origin from reputable and reliable sources. Information sourced from certified organizations, government agencies, or trusted experts is considered to be more accurate;
Evaluation of Data Quality and Consistency: After the distribution of CTI data to members of the MISP community, the members assess the quality and consistency of the collected data. In the cases of inconsistencies, contradictions, or inappropriate data, concerns about the reliability of the information are raised, prompting suggestions for modification and revision through the functionalities provided by the MISP platform.
5.1. Collected Data
The data collected during the month of January 2024 totaled 1,013,033 threats, which were obtained from honeypot and OSINT records. As observed in
Table 8, the data collected from the honeypot show a monthly total of 1,006,156 attacks, with a concerning increase every week, and 323,895 attacks in the final week.
The data provided include counts for a total of 140 countries, each associated with the corresponding sum of counts representing cyber-attack metrics. However, we only describe the top 10 countries in
Table 9 due to space constraints. China emerges as the leading contributor highlighting the global nature of cyber-security threats. The data further delineate the significant roles that the United States, Japan, and other countries play in the landscape of cyber-threats.
The attackers tactics and techniques are shown in
Table 10. Credential access stands out as a prevalent method, emphasizing the importance of securing user credentials. The data reveal the multifaceted nature of attacks, covering defense evasion, initial access, lateral movement, persistence, privilege escalation, and reconnaissance. Understanding these tactics is crucial for developing effective defense strategies. Regarding the techniques, password guessing has a staggering count of 3,229,147, indicating a substantial threat to password security. Brute-force attacks, exploits on public-facing applications, and SSH-based intrusions highlight the diversity of the techniques employed. This information is crucial for cyber-security professionals to implement targeted countermeasures against specific attack vectors.
As shown in
Table 11, a total of 6877 pieces of OSINT information are obtained through X, and most of the information is about phishing and malware. Malware entries include Qakbot, Njrat, GootLoader, RedLine, Remcos, Dcrat, AsyncRAT, AgentTesla, IcedID, SocGholish, BazarLoader, and Lazarus, as shown in
Table 12. This intelligence can help security teams identify active malware. Based on this intelligence, security experts can develop customized response strategies and respond immediately to malware. The following is a rough description of the threat group and malware that we identified in this study. Since it represents a threat group or malware that is currently active, the intelligence offers security organizations insights that can aid in the effective allocation of limited resources for preemptive action.
5.2. Machine Learning Analysis
In this study, machine learning was applied based on the data collected from the implemented platform. A dataset was prepared for machine learning, and data pre-processing was performed as follow: (i) Dataset Preparation: The dataset contained network attack logs from our system with fields such as source IP addresses, destination ports, MITRE attack IDs, and the country of origin; (ii) Data Pre-Processing and Model Training: Algorithm 2 was used; and (iii) Analysis of Model Predictions: First, the encoded predictions were transformed back to the original country names. Then, the occurrences of each predicted country were counted, and the top 10 most-predicted countries were identified.
Algorithm 2 CTI data pre-processing and model training. |
- 1:
Import Libraries - 2:
Import pandas as pd - 3:
Import train_test_split from sklearn.model_selection - 4:
Import LabelEncoder from sklearn.preprocessing - 5:
Import DecisionTreeClassifier from sklearn.tree - 6:
Import accuracy_score from sklearn.metrics - 7:
Load Dataset - 8:
Load data from ‘csvfile.csv’ into a DataFrame - 9:
Clean the Data - 10:
Drop rows with missing or empty values in specific columns - 11:
Filter rows where ‘data.srcip’ and ‘GeoLocation.country_name’ are not empty - 12:
Encode Categorical Variables - 13:
Initialize LabelEncoders for country, IP, port, and MITRE ID - 14:
Encode ‘data.srcip’, ‘data.dest_port’, ‘data.parameters.alert.rule.mitre.id’, and ‘GeoLocation.country_name’ columns - 15:
Prepare Dataset for Training - 16:
Define features X as [‘data.srcip’, ‘agent.id’, ‘data.dest_port’, ‘data.parameters.alert.rule.mitre.id’] - 17:
Define target variable y as ‘GeoLocation.country_name’ - 18:
Split data into training and testing sets (X_train, X_test, y_train, y_test) - 19:
Train the Model - 20:
Initialize DecisionTreeClassifier with a random state - 21:
Fit the model using X_train and y_train - 22:
Evaluate the Model - 23:
Predict y values using X_test - 24:
Calculate accuracy score using y_test and predicted y values - 25:
Print model accuracy - 26:
Analyze Predictions - 27:
Transform encoded predictions back to original country names - 28:
Count occurrences of each predicted country and print the top 10
|
The country with the highest number of predicted attacks was the United States with 3348 occurrences, followed by Japan with 3176 occurrences and China with 2452 occurrences. Other notable countries included Canada with 689 occurrences, the Netherlands with 652 occurrences, the United Kingdom with 413 occurrences, Germany with 396 occurrences, Russia with 289 occurrences, Denmark with 269 occurrences, and Bulgaria with 266 occurrences.
For the most-predicted IP attacks, the IP address 43.139.213.40 led with 13,088 occurrences. This was followed by 218.92.0.123 with 3946 occurrences, 38.54.116.204 with 2984 occurrences, and 180.101.88.245 with 1054 occurrences. Other significant IP addresses included 218.92.0.95 with 780 occurrences, 119.45.1.197 with 502 occurrences, 183.81.169.238 with 421 occurrences, 139.59.142.247 with 406 occurrences, 194.50.16.26 with 321 occurrences, and 159.89.235.169 with 304 occurrences.
Regarding the most predicted MITRE IDs used in the attacks, T1566 was the most frequent with 6374 occurrences. Following this, the combinations “T1595 and T1110” and “T1595 and T1082” appeared 2445 and 2411 times, respectively. T1133 was noted 1862 times, while the combination “T1595, T1189, and T1071” appeared 696 times. The combination “T1566 and T1071” was seen 358 times; T1595 alone was seen 42 times; the combination “T1078, T1046, and T1036” appeared 38 times; and the combination “T1046, T1021, and T1071” was recorded once.
The decision tree classifier achieved high accuracy on the test set, with a model accuracy of 99.79%. The model predictions were then used to analyze the most frequent attack origins, identifying the primary sources and patterns of the attacks.
6. Limitations and Future Work
The real data collection and analysis methodology for practical cybercrime threat intelligence focuses on putting a threat intelligence project into action. This methodology encourages us to always keep an eye on things and to improve things to deal with changing cybercrime threats. In this context, the honeypot system is currently located in the MENA region, but future research will focus on gathering real-time information on cyber-threats by establishing a honeypot system physically located throughout the world, expanding the platform to gather information on direct international cyber-attacks. Our plan involves developing an evaluation methodology to offer more precise and accurate intelligence based on the currently collected data, considering criteria such as attacker identity, attacker goals, execution plans and methods, and indicators for tracing execution. Through this approach, our goal is to provide not only refined intelligence information, but also visibility into attack patterns and trends targeting the Arab world.
Kris Oosthoek and Christian Doerr [
26] highlighted the need for a CTI framework. In their research, they investigated the application and impact of these frameworks on reporting analysis results, particularly in the context of reproducing CTI reports for APT malware, and they aimed to ensure accurate CTI sharing and distribution in the context of the rapid increase in new malware samples every day. The importance of behavioral labeling was emphasized. Likewise, our future research will also analyze the data collected from the honeypot and aim to share information through accurate labeling on the CTI sharing platform.
When testing this approach for a cybercrime threat intelligence project, we experimented with different ways to collect, analyze, and visually show data in order to obtain useful insights from real-world information. We continue to refine and adjust the methodology based on feedback and how the threat landscape is changing. This method lets everyone involved test ideas, check results, and make improvements using real evidence and practical observations. Finally, our goal is to generate and share direct CTI information related to cybercrime, allowing Arab world members to quickly integrate it into their security systems and policies.
We are also considering the promotion of the implemented CTI platform, currently in temporary operation, in the Arab world. The intention is to use it as a platform to launch the tentatively named Arab CTI Community (ACC). To achieve this, it is crucial to encourage the participation of major organizations in Arab countries and to establish a relationship and an atmosphere conducive to the free sharing of CTI information obtained by each organization. Above all, for the effective activation of the platform, one needs to consistently share CTI data that can serve as training material. Thus, we are willing to take on that role, and, when a consensus is reached among the major organizations in Arab countries, the plan is to officially launch the community. Machine learning techniques were applied to predict countries, IPs, and TTPs, but there was a limitation in applying a single technique. In future research, we will prepare datasets more elaborately, prepare benchmark datasets, and conduct comparative evaluations.
7. Conclusions
In this study, we examined the lifecycle of CTI and a CTI sharing platform that was practically implemented based on the MISP platform. Additionally, we outlined a platform scheme based on an open-source security system without a commercial CTI platform and the methods used to collect data in order to enhance the CTI platform, and we highlighted the acquisition of meaningful intelligence through a concise statistical analysis of the gathered data. In particular, the analysis of real threat alerts and the automatic collection module implemented to gather OSINT data distinguish this research from other research.
In conclusion, this research has played a crucial role in providing intelligence information to protect against potential future cyber-attacks or mitigate their risks. It is significant that our research is the first step towards freely sharing CTI information in the Arab world. By anticipating a potential threat and motivating the proactive implementation of appropriate measures, the initiative will significantly contribute to decision making in the future. Notably, it is expected to stand as the first free intelligence information provider within the MENA region, and we want to extend its services to member states and the international community.