A Methodology to Evaluate Standards and Platforms within Cyber Threat Intelligence

: The cyber security landscape is fundamentally changing over the past years. While technology is evolving and new sophisticated applications are being developed, a new threat scenario is emerging in alarming proportions. Sophisticated threats with multi-vectored, multi-staged and polymorphic characteristics are performing complex attacks, making the processes of detection and mitigation far more complicated. Thus, organizations were encouraged to change their traditional defense models and to use and to develop new systems with a proactive approach. Such changes are necessary because the old approaches are not effective anymore to detect advanced attacks. Also, the organizations are encouraged to develop the ability to respond to incidents in real-time using complex threat intelligence platforms. However, since the ﬁeld is growing rapidly, today Cyber Threat Intelligence concept lacks a consistent deﬁnition and a heterogeneous market has emerged, including diverse systems and tools, with different capabilities and goals. This work aims to provide a comprehensive evaluation methodology of threat intelligence standards and cyber threat intelligence platforms. The proposed methodology is based on the selection of the most relevant candidates to establish the evaluation criteria. In addition, this work studies the Cyber Threat Intelligence ecosystem and Threat Intelligence standards and platforms existing in state-of-the-art.


Introduction
Over the last years, with the relevant increase in computational power and communication technologies, a new trend of diverse network devices and different technological systems emerged quickly and they are delivering a wider range of exploitable vulnerabilities [1]. Consequently, the number of cyber attacks and their costs have also increased [2]. Also, these new cyber exploits are more complex and targeted [3], generating sophisticated and improved attacks. Such facts indicate that the cybersecurity spectrum is fundamentally changing and becoming increasingly challenging.
The progressive evolution of the current cyber attacks arises from a cascade of new sophisticated applications that are being developed by attackers and security experts, and the more complex a system gets, the more insecure it becomes [4]. Another reason for the improvement of the attacks is the fact that these are being better planned and applied in a more specific way [5], which makes them more complex. Most of them are developed to not be detected by first level defenses, being able to persist on the system [6]. Besides that, these new threats are in a constant process of modification and improvement, making their detection and defense more complicated [5]. The advances and modifications in the cyber attack ecosystem have encouraged changes in the traditional defense model and the search for more efficient and proactive methods [1,6].
Considering the presented scenario, the idea of Cyber Threat Intelligence (CTI) has been rapidly popularizing and is often posed as a new solution for applying effective security to enterprise [7]. Any valuable information that can be used to identify, characterize or assist in the response to cyber threats is commonly referred to as cyber threat information and the analysis of this type of information can produce intelligence to inform the user about threats to their system [8]. Within the limitations of the CTI approach, there is the heterogeneity of the data involved [9,10], and the massive amount of data for collection [11]. So, in order to effectively use the cyber threat information, mechanisms capable of consuming, analyzing, evaluating and classifying the information are highly needed [12].
Thus, new automated systems with the ability to consume a vast amount of data, provide sophisticated defense capabilities and respond to incidents in real-time are being developed and commonly referred to as Threat Intelligence (TI) platforms [13,14]. These platforms should include automatic processes of data transformation and intelligence production to ensure a more efficient, proactive and timely defense model [15]. Besides, due to the heterogeneity of the data inserted in the CTI context, considerable efforts have been made in order to standardize the data [13] and make it compatible among different systems [12]. The interoperability of the data is important to facilitate the automatic gathering and analysis of the data and the sharing of cyber threat intelligence [9].
However, since the field is growing rapidly [15], today CTI concept lacks a consistent definition [11,16] and a heterogeneous market of CTI platforms emerged. Diverse systems and tools, which accomplish different goals, are implemented as threat intelligence platforms [14]. Besides, capabilities, performance levels, and applicable use cases vary greatly among platforms an this is not always transparent to the user [14]. Therefore, it is notorious a research gap involving the analysis of CTI systems and tools available, in order to describe in detail their features and evaluate the quality of the information that can be produced and disseminated with them.
The goal of this research is to provide a methodology for the evaluation of the standards and platforms of Cyber Threat Intelligence. The research includes a review of the state-of-the-art regarding the cyber threat intelligence ecosystem and existing TI standards and platforms, by presenting the directions that the theme has been following in recent years and the TI initiatives that have been consolidated or have great potential to be consolidated in this area. To achieve that, first, a review of the existing CTI standards and platforms is made to identify potential and relevant research opportunities. Then, a selection strategy is proposed to define the most popular standards and platforms. The selected ones had their features and usability analyzed, in a practical way. Finally, they were evaluated based on holistic and comprehensive evaluation criteria.
Previous researches have focused on comparing a large number of platforms and apply little or trivial criteria in the evaluation process. Our work complements the related research work by summarizing a significant and comprehensive evaluation of TI standards and platforms simultaneously. It presents an adequate strategy for selecting relevant platforms and standards and an integrated methodology covering a wide range of aspects for the evaluation criteria. Finally, the results and discussion about the evaluation process provide a valuable overview of the CTI landscape.
The remainder of this paper is structured as follows. Section 2 brings an overview of related work and some definitions relevant to understanding the cyber threat intelligence landscape. Section 3 presents a methodology proposal for evaluating CTI standards and platforms. Sections 4 and 5 present the results of the evaluation. Section 6 gives a discussion. Finally, Section 7 concludes the paper.

Related Work and Definitions
In this section, a brief background on the CTI panorama is provided. Relevant related work to our research topic is presented along with some definitions and concepts.
Even though a lot of work has been done in the area of Cyber Threat Intelligence in recent years, most of it is not focused on analyzing and evaluating the state-of-the-art of TI standards and platforms. Besides, considering that this type of technology evolves at a rapid pace, some work and results can become out of date. To get a detailed picture of available work in the area and research gaps, a literature review was made.
Much work has been carried out into understanding and presenting the Cyber Threat Intelligence landscape. A survey [17] provides a broad description of the CTI topic and briefly mentions some platforms and standards. In Reference [18], the work is more focused on TI platforms and presents a general overview of the threat intelligence platforms landscape, including open source and commercial platforms. Other work [19] describes some selected open source threat intelligence platforms and standards but no evaluation is done. A recent survey [20] discourses about research opportunities regarding exchange standards and mentions, without any type of analysis, some of the most popular languages for CTI description and sharing, which are: Structured Threat Information eXpression (STIX), Trusted Automated Exchange of Intelligence Information (TAXII), Open Threat Partner Exchange (OpenTPX), Malware Attribute Enumeration and Characterization (MAEC), Incident Object Description Exchange Format (IODEF) e Vocabulary for Event Recording and Incident Sharing (VERIS).
Considering the existing limitations to fully implement CTI mechanisms as a model of defense, some work has been made to propose frameworks and platforms that could overcome these obstacles. An important limitation is the quality of CTI data produced and shared [4], that were addressed in many works with the use of machine learning techniques. To CTI be effective for defense models, it should be sensitive to the context which is applied [4,13]. Thereby, initiatives that uses machine learning techniques for predicting personalized and context-aware data, as the ones presented in References [21,22], could bring great advances to CTI by assisting in data enrichment processes. Also, as proposed in Reference [23], intrusion detection systems can benefit from artificial intelligence mechanisms, as machine learning, to build an intelligent data-driven system. Following this idea, in Reference [24] presents good prospects for the use of artificial intelligence in cyber security. The research discourses about the massive amount of threat data available and puts the powerful automation and data analysis capabilities of machine learning techniques as a solution to handle the volume of data. In Reference [25], using machine learning algorithms, a framework was developed to collect and analyze data and attribute threat incidents to their threat actors. Another work, Reference [26] proposed a threat intelligence platform with an architecture based on state-of-the-art systems like Malware Information Sharing Platform (MISP) and Collaborative Research into Threats (CRITs). The platform applies machine learning algorithms to analyze and classify email content and actively defends against social engineering. To increase the maliciousness estimation of threat indicators, Kazato et al. [27] uses a graph convolutional network based method. The method provided an improvement to the maliciousness estimation accuracy and reduced the time allocated to the analyses of indicators by human hands. Also in terms of improving the quality of threat indicators, in Reference [9] a novel method to automatically extract indicators and apply domain tags from social media is proposed. The method includes a convolutional neural network to recognize CTI domains and correctly classify threat data into these domains. The amount of related work that explores the use of artificial intelligence, manly machine learning techniques, in cyber security methods and frameworks shows the importance of its utilization and indicates some research opportunities.
Regarding another limitation, in Reference [8], the difficulty related to the lack of confidence from organizations in sharing sensitive CTI data is addressed. Chadwick, et al. [8] introduces a trust model among organizations combined with a data sharing and analysis framework allowing a confidential exchange of data. Irrespective of some limitations, the results presented show the achievement of expected results. Along the same lines, Riesco et al. [11] works to reduce the reluctance to share CTI data. The work provides an extensive list of the open challenges of existing solutions in information sharing, and propose a solution to cover all these challenges at the same time.
The solution uses a combination of STIX and Ethereum Blockchain to achieve its goal. Results presented showed an improvement in important points like identification of trusted sources, availability and economic cost, comparing to other solutions available. The approach presented in Reference [28] also works on the topic of sharing CTI data securely. The method not only provides trust levels among organizations, but it also combines the model of trust with encryption mechanisms for the data, bringing more confidence to the sharing process. In the same way, Wu et al. [29] proposes a framework for decentralized sharing of data using blockchain. The work considers the trust between participants, the trust of TI quality and the trust in the platforms, and uses encryption of data as one of the mechanisms to ensure confidentiality. Thus, it is notorious that the use of encryption systems like the ones presented in Reference [30,31] could bring improvements to CTI processes. Even though encrypt CTI data before sharing it could improve the confidence from organizations in exchanging their sensitive data, the majority of the TI standards use common formats, like JSON and XML, to provide threat data, relying the security on the transport mechanism used. Since most standards follow this perspective, TI platforms available usually also do not support encrypted data as import or export formats. However, considering the benefits that the combination of encryption systems and CTI platforms could bring, this should be considered a productive research gap.
In another perspective of analysis and comparison, some research was made regarding CTI standards taxonomies and ontologies. The work presented in Reference [32] aimed to analyze different CTI exchange ontologies. A layered model is proposed and two sharing protocols, together with their respective transport protocols (STIX/TAXII and IODEF/RID), are examined using the model. The work provides a great overview of the analyzed protocols, presenting a detailed schema of each one and leaving research opportunities about the topic. In Reference [33], taxonomies, sharing standards and ontologies relevant to CTI scope are evaluated. These topics are analyzed based on data and concepts defined by two different CTI models presented, relationships with other taxonomies and ontologies and description provided by its documentation and source files. The sharing standards evaluated were STIX, MAEC and OpenIOC, however, they were not the main topic of discussion since the focus of the work stood out on the ontologies subject. Reference [34] focuses on semantic ontologies for sharing standards. In this work, STIX and IODEF were mapped into RDF/OWL ontologies and the mappings were analyzed providing an overview of their characteristics and showing differences and similarities between the standards. Some similar research interested in the evaluation or comparison of standards or platforms were conducted. One of the first works on this subject [35], introduces 8 different exchange formats: Common Intrusion Detection Framework (CIDF), IODEF, Common Announcement Interchange Format (CAIF), Intrusion Detection Message Exchange Format (IDMEF), Abuse Reporting Format (ARF), Common Event Expression (CEE), Extended Abuse Reporting Format (X-ARF) and Syslog. These formats are evaluated based on an evaluation methodology proposed that consists of 10 different criteria such as interoperability, confidentiality and practical application. The evaluation provides a significant methodology and good results but some important formats, that are currently relevant, have not been addressed showing that some results are out of date. In Reference [36], the complete CTI panorama is considered and some CTI standards are presented. Besides that, a good evaluation of some open source threat intelligence tools is done in order to compare them with the tool proposed in the work. In Reference [16] the classification and analysis of 23 threat intelligence platforms are made based on the licensing model, supported standards, type of platform and type of information shared. The result of the analysis presents some interesting facts about the CTI panorama, like the finding that most of the threat intelligence platforms are closed source, the description of STIX as the de-facto standard for describing threat intelligence and the discovery that most platforms prioritizes sharing over analyzing the information. However, the results are consolidated in eight key findings, which does not allow an in-depth understanding of the features and operation of the platforms. In Reference [37] a satisfactory comparative analysis of some important incident reporting formats, including different versions, is performed revealing strengths, weaknesses and additional information of the formats.
In Reference [12], in order to explore the existing interoperability challenges when using specific sharing solutions, Rantos et al. [12] analyse semantic, syntactic and technical aspects of the most prominent standards considered by the work. Thus, some characteristics including type of data and supporting sources were described and can be used as a base for future research in the area.
A recent work [13] provides a comparative analysis of cyber threat intelligence sources, formats and languages. Several CTI sources are presented and examined, and based on the results of the examination together with literature research, some CTI standards were selected for further analysis. Many criteria and features were considered in the comparison, providing a great and detailed description of the capabilities of some relevant CTI standards. Some specific CTI standards are analyzed in the work, like STIX and MISP, but some common formats like CSV and RSS are also included in the comparison, which differs from the standards selected in this paper that were designed specifically to represent threat intelligence data. With a similar goal to this work, Bauer et al. [14] presents a framework capable of analyzing and comparing threat intelligence sharing platforms. Based on a systematic literature review, 40 different publications that contained characteristics or requirements for Threat Intelligence Sharing Platforms (TISPs) were studied. Therewith, 62 essential evaluation criteria were determined and divided into six main categories that were used by the proposed framework to evaluate the platforms. The work mentions that the framework was applied to ten different TISPs, but only three of them had the results described. The results revealed interesting information about the described platforms, including some similarities and differences. However, for limitation issues, only a small set of platforms were considered.
Most of the research and work developed in the field focus on comparing a large number of platforms or standards and does not provide a critical analysis. On the other hand, few works present great evaluation or comparison but only of a few platforms or standards. Besides, some works focus efforts on TI initiatives that have not been consolidated in the area or are out of date. So, to the best of our knowledge, no prior research has been conducted that simultaneously analyzes and evaluates standards and platforms relevant for the TI scenario, based on a methodology that covers a wide range of criteria for the evaluation.
Some fundamental concepts must be presented to facilitate the understanding of the methodology adopted and results obtained. These definitions will be presented as following.

New Threat Landscape
The great evolution of computing in recent years largely stems from the appearance of multiple and heterogeneous devices [38] that can interact with other objects and applications over the internet [39]. However, the heterogeneity and interconnectedness of these devices lead to a significant increase in the number of security attacks [40] and the threat environment is expanding in alarming proportions. This growth comes together with more complex attack scenarios and sophisticated threats. Nowadays, adversary behavior is more focused on the target and it considers multi-staged attacks that aim to persist on the host or system and cause ongoing damage [41]. Most of these attacks do not generate noise or substantial changes in the environment, making it harder to detect.
Some of these new generation threats are denominated Advanced Persistent Threats (APTs). They perform a sophisticated type of attack characterized by establishing a persistent foothold into the target and stay undetected for a long period of time [5]. Also, there are polymorphic threats with the capability of constant modification, making the detection a complex task. Additionally, other type of threat largely exploited is the zero-day vulnerabilities. Since they explore unknown vulnerabilities of software, it is easy to stay undetected for long periods until the flaw is discovered and patched [4].

Threat Intelligence
The term intelligence has the most diverse definitions. This can be explained due to the fact that intelligence is a concept strongly dependent on the context it is inserted. A generalized definition of the term was presented in Reference [42] and considered widely applicable. It describes intelligence as the process of transforming topics from the completely unknown stage until reaching a state of complete understanding. In order to achieve this goal, random and generic data must be filtered in a minor and relevant data set based on the context intended, which are then processed and transformed into information.
In this sense, the information, when analyzed and contextualized, becomes intelligence [7]. Considering such assumptions, a generic intelligence production process is commonly composed of three main stages: collection of data, processing the data to transform into information and analysis of the information to produce intelligence.
The intelligence concept can be divided into different strands, where actionable intelligence is one of them. So, for intelligence to be considered actionable, it must meet the requirements of being timely, accurate and relevant [43].
Following this perspective, threat intelligence should satisfy these characteristics to provide assistance in developing efficient mechanisms to respond to threats, which is commonly defined as a type of actionable intelligence. Thereby, in addition to the generic intelligence production flow aforementioned, the stages of deploying and disseminating the intelligence are also contemplated as essential to the generation process of threat intelligence. So, in the context of this work, the intelligence process flow is composed of five main stages, as presented in Figure 1: 1. Collection: This step refers to the gathering of data, which are simple indicators or facts. 2. Processing: works on combining the data aiming to answer specific questions and provide information. 3. Analysis: evaluates data and information together helps to uncover patterns and to produce actionable intelligence. With the intelligence produced, it is possible to 4. Deploy: it by making decisions to utilize the intelligence. 4. Dissemination: Expand it by sharing the intelligence with interested parts.

Cyber Threat Intelligence
Within the spectrum of threat intelligence is the concept of Cyber Threat Intelligence (CTI), a relatively new approach that has become highly encompassing and used to define different types of services offered. It can be considered an actionable intelligence generated based on evidence of mechanisms, indicators, implications and context concerning threats or incidents in the cyber domain. It provides knowledge about adversaries and methods that can assist in the decision making process of responding to threats [43].
To the CTI be applied correctly and have effective results, it is necessary to establish a process flow for its production [1]. First, it is important to understand the needs of the users of the intelligence being developed and the context in which it is inserted [44], thus the requirements are important to be defined properly.
Once the requirements for the CTI are defined, start the data collection stage. It is known that data and information without treatment and context are not considered intelligence, but these are the basic materials for its production. Then, some mechanisms consume this information and perform the processing and analysis to generate structured information and find patterns. The treated information can be integrated with other defense mechanisms and then used to perform and develop methods of defense and threat mitigation [43].
Finally, as organizations lack the ability to understand the cyber threat landscape holistically, the stage of sharing and disseminating threat information between organizations is of utmost importance [41].

Threat Intelligence Standards
A crucial aspect of the entire threat intelligence process is the format of the shared data. First, for an adequate and automated processing of the collected data, it is important that they are formatted in a structured model and outlined in a common language. In addition, the establishment of standards provides a prior definition about the type of information will be shared and the density of that information [37]. As a result, a variety of initiatives have emerged with the aim of standardizing the information collected, consumed and disseminated within the CTI ecosystem [34].

Threat Intelligence Platforms
The establishment of a new threat landscape encouraged the change of traditional defense models. New systems with proactive action and capabilities of real-time response to incidents are being developed and commonly referred to as Threat Intelligence Platforms (TIPs) [34]. They are specialized software systems that implement the processes of collection, processing, analyzing, producing, deploying and integrate internal and external threat intelligence. The main goal of this type of platform is to serve as an assistant to decision makers related to incident response [18].

Proposed Evaluation Methodology
In this section, the approach proposed to evaluate CTI standards and platforms is described. First, a method is developed to restrict and define which standards and platforms will be analyzed. Then, the criteria for evaluation are introduced and explained.

Selection Strategy
The threat intelligence scenario is very extensive and includes a wide range of systems, platforms, tools, standards and formats. In order to perform an accurate evaluation, it is necessary to define a strategy of inclusion and exclusion to select some of them. Thereby, in this work, we focus on the most popular and open source standards and platforms.
A literature review was conducted aiming to obtain a complete overview of the CTI panorama. Some web research engines were used in this task, including Google Scholar, IEEE Digital Library, Springer and ScienceDirect. The following terms were merged to find suitable results: (Threat Intelligence OR Cyber Threat Intelligence) AND (platform OR tools OR standards). The searching process resulted in a large number of works and some of them were filtered based on their relevance to the topic and number of citations.
For the filtering methods applied to the searching process, a considerable amount of initiatives of TI standards and platforms were collected. Thus, to reduce the number of results found, based on brief readings of the official web sites and documentation, standards or platforms that were not able to address two or more stages from the threat intelligence process flow, presented in Figure 1, were excluded and not considered for the selection process.
Then, in order to select for evaluation the most relevant standards and platforms in the TI field, the results found with the searching process were described in terms of popularity and license model:

•
Popularity criteria: In the context of this work, the popularity was estimated according to the number of times the standard or platform was mentioned in reliable works and sources, combined with collected statistics about the percentage of utilization among organizations. • License model criteria: The type of license was analyzed aiming to limit the work to standards or platforms available to all interested communities by including only free or open source solutions and initiatives.
Finally, based on the criteria aforementioned, a minor data set of standards and platforms, composed with the most popular ones and those with permissive licenses or free versions, were chosen for evaluation.

Evaluation Criteria
In order to define the criteria for evaluating TI standards and platforms, two main aspects were taken into consideration. First, we analyzed the applicability in different use cases by defining a holistic architecture model. Then, some general evaluation criteria were inferred of the intelligence process flow.

Data Model Architecture
In terms of architecture, to define the applicability in different use cases, four main entities were used to represent the cyber threat intelligence scenario in a generic manner. In this approach, those entities indirectly derives from the 5W3H (what, who, why, when, where, how, how much and how long) method, which aims to answer the questions presented in Table 1. This is a generic method, applied in different areas, with the main objective of clarifying a topic in its completeness. The method was used for representation because, since it consists of an effective mechanism to obtain the complete description of a topic, it facilitates any necessary decision making about the subject addressed. Thus, within the threat intelligence ecosystem, the method can be used not only to identify and characterize threats and incidents. When correlating data from indicators and the information generated with the questions raised by the 5W3W method, it is simpler to implement effective mechanisms for detecting and mitigating threats.
First, what is used to directly define the topic that is being analyzed. In the cyber threat intelligence context, it can be usually summarized by the terms threats and incidents, which cover since evidences and probable attacks to real malicious occurrences. For an adequate definition, these terms should be accompanied by other information like the type of threat or incident and the context it is inserted. After defined, the topic can be characterized using where, when and how. Where can refer to the geographic location it started and parts of the path it took until it reaches the target. When provides a time frame, specifying date and time of occurrence. How is used to describe the way the threat or incident took place and the tactics and techniques applied. It is important to say that the granularity of the information describing these entities is variable depending on the use case.
Another essential point is to associate the threat or incident with its threat actor, which can be described by who and why. Who can be an organization or an individual that is responsible for the threat or incident. Why is important to better characterize the threat actor by understanding the motivations behind the event.
Some detailed characteristics of the threat or incident can be discovered using how long and how much. How long indicates the effective durability of the threat or incident if no action is taken. How much is used to measure the intensity of the attack and analyze its damage capacity and defense cost. The information gathered with the how long and how much statements, together with all the characteristics described with the how statement, can also be used to analyze and measure the capacity of action of the threat actor.
Further, using the correlation between all the information raised about the threat, incident or threat actor using the 5W3H method, it is very likely that actionable intelligence was produced and it is possible to use it to define mechanisms for defense and specify some courses of action.
Based on the exposed, the four main entities used to delineate a holistic representation of the cyber threat intelligence scenario are threat, incident, threat actor and defense. To illustrate the context that these entities are inserted and the relationships between them, a diagram is shown in Figure 2.

Intelligence Process
In order to be able to evaluate general criteria, essential features to achieve a complete threat intelligence process were delineated including some criteria proposed in References [35,37]. Considering the threat intelligence flow presented in Section 2.2, for the collection stage, it is important to provide the data in a common format to facilitate the process of gathering it. Next, to process and normalize the data, a structured format and machine readability are essential. Also, low overhead produces a more efficient processing. The analysis step requires an unambiguous data model to perform correlations and classify the information, besides relationship mechanisms to represent those correlations. With the analyzed information accessible, interoperability between formats, systems and platforms is necessary so the actionable intelligence can be deployed correctly and automatically. Later, to disseminate intelligence and information, along with some above mentioned aspects, it is relevant to have a specific transport mechanism and good practical use in the community.

Additional
When referring to the TI platforms, considering that ease of use and flexibility for the implementation of new features are relevant aspects, some additional criteria were applied. Thus, the quantity and quality of the documentation and the permissions declared in their licenses were evaluated.
Based on the above, all evaluation criteria for TI standards and platforms have been defined. Tables 2 and 3 summarize the whole criteria explained in this section.

Standards Evaluation Results
First step to evaluate, the standards are selected based on the strategy described in Section 3.1. Furthermore, the results of the evaluation are based on the evaluation criteria proposed in Section 3.2.
After gathering the most suitable results from the searching process of TI standards, some relevant initiatives were found. In References [33,36] some projects that aim to standardize threat intelligence data are mentioned, such as STIX, TAXII, CybOX, OpenIOC, CAPEC, MAEC and ATT&CK, being STIX considered the most used one.
In Reference [37] other standards are mentioned such as VERIS, STATL, ARF and X-ARF and some of them are evaluated. Other works [16,19] only mentions the standards considered as consolidated, which are: OpenIOC, CybOX, STIX, TAXII e IODEF. A survey [45] presents in statistical terms the most used patterns, which are: STIX, OpenIOC, CybOX e IODEF. In Reference [32] a comparison between IODEF/RID and STIX/TAXII, considered as the most popular standards. Finally, some recent works [9,12] present standards that are considered prominent nowadays.
Since all the standards found are released for community use, the popularity was the key criteria for selecting them. Considering the results obtained with the literature research, complemented with the review of the official web sites and documentation of most standards, the standards were ranked in terms of popularity and the results are presented in Table 4.  [35,37] Legend: very high (++++) high (+++) medium (++) low (+).

Standards Selected for Analysis
Given the presented results, the standards selected for further analysis and evaluation are STIX, TAXII, IODEF, RID, CybOX and OpenIOC. Following, a succinct presentation of the selected standards is provided.

Structured Threat Information eXpression
STIX is a language created by MITRE and developed to capture, specify, characterize and communicate information in the context of cyber threat intelligence [41]. It provides mechanisms to represent structured information in different scenarios of the cyber threat ecosystem.
The language was designed with principles such as interoperability, extensibility, focus on automation and machine readability. In the first version, STIX was modeled in the XML format and it was composed of eight cores. The second version was developed using serialization in JSON format [46].
Its structural architecture has been significantly modified, is currently composed of twelve main objects that correspond directly to concepts embedded in the context of CTI [47]. With its holistic architecture, STIX is able to present information in a standardized, comprehensive and structured manner while allowing application in different use cases. In addition, it is directly integrable with other languages in the TI context [17].
Cyber Observable eXpression is a language created by MITRE and developed to specify, characterize and communicate information about cyber observables in a standardized way. With the release of the second version of STIX, it is no longer used as an independent language. It was integrated into the second version of STIX that defines a structured representation for observable objects in the cyber domain, called Cyber Observable Object [48].
This standardization can be used to describe different types of data, from a host characterization to information about digital forensics. The objects are represented using serialization in JSON format [49].

Trusted Automated Exchange of Intelligence Information
TAXII is an application layer protocol created by MITRE that defines a set of services to exchange TI information messages between organizations [50]. It was projected specifically for the transport of information formatted in the STIX language but is not limited to it. TAXII uses Hyper Text Transfer Protocol Secure (HTTPS) as the transport protocol and supports different sharing models, including hub-and-spoke, P2P and publish-subscribe.

Incident Object Description Exchange Format
IODEF is an Internet Engineering Task Force (IETF) initiative that aims to facilitate information sharing between organizations and increase the possibility of mitigating cyber threats [51]. In the first version, the data model was focused on representing incidents. Bringing a more holistic approach, the second version of IODEF was designed with significant evolution in its structural part, which now includes structures for the description of indicators, attackers and incident response methodologies [52]. Both versions use the XML format.

Real-Time Inter-Network Defense
RID is an IETF initiative created to facilitate the process of sharing data about security incidents mainly structured under the IODEF format. It outlines a proactive internal communication network, capable of integration with mechanisms for detecting, identifying, mitigating and responding to incidents, aiming to compose a complete solution in the treatment of security incidents [53]. RID messages are transported under the HTTPS protocol and, in order to provide more security, the protocol adds another layer of security to manage sessions.

Open Indicator of Compromise
OpenIOC is a framework that offers a standardized, structured and machine-readable format used to record, define and share information encompassed in the context of cyber attacks and incidents [54]. This format is written in XML, with a relatively light and small design. Its architecture is composed of more than 500 specific types of data incorporated to represent indicators.

Evaluation of the Standards
After the selection and definition of the standards, an evaluation was made based on academic literature, the study of official documentation and practical demonstrations.
Taking into account the aforementioned criteria, the evaluation of the standards was made and it is summarized in Table 5. Regarding the results illustrated in Table 5, two considerations must be highlighted. First, as explained before in Section 4.1.1, since CybOX and STIX were usually used together and both standards are maintained by the same organization, CybOX was integrated into the second version of STIX and it is no longer used as an independent language. Thus, as CybOX is now a part of STIX structure it was considered more plausible to evaluate them as a single standard. Second, it was noticed that even though TAXII and RID are autonomous protocols, they are mostly used combined with STIX and IODEF, respectively. It stems from the fact that TAXII and RID are protocols designed specifically to facilitate the transport of STIX and IODEF. Hence, it was chosen to evaluate these protocols as pairs (STIX/TAXII and IODEF/RID) considering the fact that their functions are complementary.

Data Model Architecture
From an architectural perspective, STIX is the language with the most holistic architecture and it is applicable in different use cases. The four entities considered essential to delineate a holistic contextualization of the cyber threat intelligence scenario can be fully represented and characterized by the classes that compose the STIX schema. IODEF and OpenIOC also have a satisfactory architecture but with some flaw points. Both standards do not have the necessary resources for an adequate definition of defense mechanisms or courses of action. Besides, OpenIOC has some shortcomings in the process of characterizing a threat actor in a more specific way.

Intelligence Process
From the process perspective, STIX has the capability to meet most of the proposed requirements. The use of serialization in JSON format provides a common and structured format, with relatively low overhead and machine-readability. The twelve objects that compose STIX architecture are well described and documented providing an unambiguous data model with coupled relationship mechanisms. When used together with TAXII, it offers a reliable transport mechanism. Finally, since it has a significant practical application, most of the TI platforms and tools have integration methods with this standard.
IODEF and OpenIOC are based in the XML format, so they also provide a common and structured format with machine-readability. However, IODEF can present some problems due to free text fields that compose its data model. Regarding the relationship mechanisms, OpenIOC provides logic operators (AND/OR) to create connections between indicators, on the other hand, besides the interconnections on the data model, IODEF does not present specific mechanisms to relate information.
IODEF and OpenIOC are supported in many platforms and tools and can be integrated with different systems. IODEF can be used together with the RID protocol, providing an efficient and secure transport, while OpenIOC does not focus on implementing transport mechanisms.

Synthesis
As a result of the evaluation, it can be said that STIX is de-facto standard in the threat intelligence context. First, STIX is the most popular and compatible one, being supported by many platforms and tools and used by most organizations. Second, due to its holistic architecture and its capability of addressing a lot of scenarios in the threat intelligence scope, can be considered the most complete standard.
Even though the other standards are still supported by some platforms and have satisfactory applicability, the features offered by STIX have stood out. So, considering the characteristics of STIX and the capacity of possible results it can generate, it can be said that the standard has the most satisfying performance.

Platforms Evaluation Results
Results regarding the selection and evaluation of the platforms are presented and explained. From the searching process of TI platforms, a massive number of projects were identified. The most relevant results count more than 30 different platforms. In References [16,55] a significative number of platforms were analyzed, totalizing 30 and 23, respectively. In Reference [20], a smaller number of platforms are mentioned and considered consolidated in the area.
In more specific studies [19,36,56] only open source and popular platforms are evaluated. Another work [14] proposed a framework to evaluate some platforms and described the results from three of them. Some reliable and relevant sources also mentioned emerging platforms that have great potential [57,58]. A considerable part of the platforms presented was excluded according to the exclusion method applied that considered the adherence to the intelligence flow. Thereby, a total of 16 platforms were ranked in terms of popularity and the results are presented in Table 6. Table 6. TI platforms described by popularity and license model.
Considering that the scope of this work is restricted to open source or free platforms, about half of the platforms were excluded. Next, the popularity criteria was applied to select the platforms.

Platforms Selected for Analysis
Given the presented results, the platforms selected for further analysis and evaluation are MISP, CIF, CRITs, OpenCTI and Anomali STAXX. Following, a succinct presentation of the platforms is provided. •

Malware Information Sharing Platform:
MISP is an open source TI platform that allows the sharing, storage and correlation of indicator of compromise (IOCs) [59]. The tool provides a database of indicators, including technical and general information about cyber threats, which are stored in a structured format and with a flexible data model. The stored data is automatically correlated to describe the relationships between attributes and indicators. •

Collaborative Research into Threats:
CRITs is an open source web-based tool that integrates a repository of malware and cyber threats with other software capable of offering mechanisms for analyzing and correlating the information [60]. This initiative was developed by MITRE with the main objective of assisting the cybersecurity community in the process of analyzing and sharing data about threats [61]. • OpenCTI: It is an open source framework with the main objective of aggregating, in a comprehensive way, general and technical information from the CTI context [62]. It assists organizations in the process of managing their content about cyber threats, allowing the structuring, storage, organization and graphic visualization of this information. • Collective Intelligence Framework: CIF is a system focused on speed, performance and integration used in threat information management [63]. It assists users in formatting, normalizing, processing, storing, sharing and building threat data sets. The system extracts information about cyber threats from a wide range of sources and creates a sequential and chronological grouping about a specific threat [64]. • Anomali STAXX: It is a tool that provides bidirectional sharing between sources of threat intelligence that use the STIX and TAXII standards [65], facilitating access to information about cyber threats. The platform provides an intuitive dashboard to present data obtained from different sources.

Evaluation of the Platforms
After the definition of the platforms, an evaluation was made based on academic literature, study of official documentation and practical demonstrations. According to the aforementioned criteria, the evaluation of the platforms was made and it is summarized in Table 7.

Data Model Architecture
From an architectural perspective, MISP and OpenCTI are the ones with the most holistic approach and applicable in diverse scenarios. Additionally, when used correctly, both platforms have the capacity to address efficiently the 5W3H method and provide full support to the decision making process. The other platforms have some points of failure regarding to the representation of entities incorporated on the cyber threat spectrum. Besides, the platforms are not focused in classification and correlation mechanisms. As a result, not all aspects of the 5W3H method are supported.

Intelligence Process
The evaluation from the process perspective provided some significant findings. First, all the platforms support the importation and exportation of information in at least one of the most common formats such as XML, CSV and JSON. Also, with the exception of CIF, the platforms are compatible with STIX, considered the consolidated standard in the TI ecosystem, along with other standards.
MISP can be contemplated as the most flexible platform considering the compatibility with different formats. Next, about the collection process, all the platforms have the capability to perform automatic gathering of information.
For CIF and Anomali STAXX, this feature is built-in, while for the other platforms some integration might be necessary. Other important point to analyze are the correlation and classification mechanisms, which are well performed by MISP and OpenCTI. The other platforms are not focused on this stage of the CTI process and provide only some filtering or aggregation mechanisms.
Regarding the information visualization, except CIF that is based on command line, the platforms use dashboards to present the information. MISP, CRITs and Anomali STAXX provide a generic dashboard for all the information in the platform, while OpenCTI builds personalized dashboards for the different information in the platform. It also provides intuitive and complete relationship graphics based on STIXv2. MISP and CRITs offer services for relationship visualization as well.
Considering the integration between platforms, systems and tools, MISP and OpenCTI are the most adjustable ones. CIF has some extension codes to use in the integration with some Intrusion Detection Systems (IDSs) and the other platforms do not have specific mechanisms for this kind of integration. About the sharing criteria, Anomali STAXX can communicate with any platform using TAXII, while MISP, CIF and CRITs focus on establishing a reliable group of instances.
Finally, all platforms have available documentation and for MISP, OpenCTI and Anomali STAXX it is very extensive and elaborated. For CIF, there were some difficulties in finding and organizing the documentation that is limited in details and present succinct descriptions and CRITs have a satisfactory amount of information and details.

Synthesis
As a result of the evaluation, it can be said that, currently, there are some satisfactory TI platforms. Each of them has some differentials that can optimize the process of creating threat intelligence. Therefore, it is important to analyze the context in which they will be applied and the goals that must be achieved in order to decide which platform to use.

Discussion
To achieve great cyber threat detection and preventive capabilities, most organizations need to rely on available open source TI platforms. Similarly, these platforms need consolidated standards to provide an automated, shareable and reliable service. Thus, it is essential to analyze the features and operation modes of these two strands of the threat intelligence domain.
By evaluating some common TI standards, it is notorious that STIX, combined with TAXII features, can be considered the most holistic and applicable one. In addition, statistics show that it also is the most used standard among organizations [45]. Therefore, an important step is a definitive consolidation of this standard, so the goal of establishing broad integration and interoperability between organizations can be accomplished. Besides, the definition of an accepted standard can provide the optimization of processing, analysis and sharing tasks performed by TI platforms because it focuses efforts on a predefined data model and one based on STIX would certainly be holistic and very applicable.
Regarding the TI platform analysis, several interesting solutions were found. Some of them focus on providing speed and performance, others brought great efforts on the visualization of the information, while a few implement a little bit of each feature. As a matter of fact, diverse types of systems, with different goals, are defined as threat intelligence platforms. This probably derives from the fact that there is currently no standardized definition for the concept or process of cyber threat intelligence. As CTI is a very extensive domain, it would be relevant to establish scopes to better characterize the platforms available, making it easier to decide which platforms are best applied in each use case.
Taking into consideration that TI platforms have different goals, it can be said that currently there is no fully complete platform, with the capacity to attend all the CTI processes adopted in this work. Thus, a possibility to expand and optimize the results obtained with the application of the CTI processes would be the integration between different TI platforms, with complementary objectives. Adopting this perspective, it is possible to reconcile different aspects such as performance and visual mechanism, achieving a fully developed CTI process, which provides everything from data collection to the transformation of data into actionable intelligence.
For the reasons discussed it is still necessary to carry out research and work in order to characterize the concept of CTI in a more specific way. Not only a definition should be established, but also the processes that are involved in this domain. Thereby, the wide range of systems available, denominated as TI platforms, could be better used and applied, and new systems that will be developed could be better designed, being able to adopt specific and more optimized processes or a complete approach that fulfills all predefined requirements for creating threat intelligence.

Conclusions and Future Work
As the cyber security landscape is fundamentally changing and a new threat scenario is emerging, the development and investigation of more efficient defense mechanisms became a necessary task. In this work, an overview of the cyber threat intelligence scenario and existing standards and platforms of the threat intelligence spectrum was provided. Based on academic literature and official sites and documentations, a group of relevant standards and platforms were defined. Considering the scope of the research, a selection strategy was proposed and applied in order to determine the most popular and efficient standards and platforms that are free or open source. Then, the standards and platforms selected were evaluated based on a developed methodology that contemplates architectural and processual criteria.
From the evaluation of TI standards, we concluded that STIX is the most consolidated standard in the area, mainly due to its holistic approach, which makes it applicable in a wide range of scenarios, and compliance with fundamental requirements for a standard, such as interoperability and machine readability. Concerning TI platforms, MISP and OpenCTI were considered the most complete and flexible platforms. Although there are sophisticated solutions available, there is none that addresses the entire CTI process.
To conclude, even though some great solutions are available in the market, it is still a challenge to find a thoroughly and absolute solution for a defense based on threat intelligence, since the platforms have divergent focuses and consequently correspond to only a few stages of the threat intelligence production flow.
Future work will address assessing and validating the methodology and results here presented by executing an experimental evaluation and running tests using data sets of cyber threat data. New research will be focused on evaluating the completeness of the CTI process that can be supplied by available platforms in a practical way, using the benefits of interoperability among platforms. Along the same lines, research is to focus on the integration between complementary platforms in order to provide a more complete solution to manage and use threat intelligence. Finally, the delineation of a standardized definition for the CTI concept and process to assist in the design of new and optimized threat intelligence systems capable of establishing an efficient defense model is still a research gap.