Next Article in Journal
Investigation on Self-Admitted Technical Debt in Open-Source Blockchain Projects
Previous Article in Journal
Synonyms, Antonyms and Factual Knowledge in BERT Heads
Previous Article in Special Issue
From Service Composition to Mashup Editor: A Multiperspective Taxonomy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New AI-Based Semantic Cyber Intelligence Agent

School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
Future Internet 2023, 15(7), 231; https://doi.org/10.3390/fi15070231
Submission received: 22 May 2023 / Revised: 15 June 2023 / Accepted: 27 June 2023 / Published: 29 June 2023
(This article belongs to the Special Issue Semantic Web Services for Multi-Agent Systems)

Abstract

:
The surge in cybercrime has emerged as a pressing concern in contemporary society due to its far-reaching financial, social, and psychological repercussions on individuals. Beyond inflicting monetary losses, cyber-attacks exert adverse effects on the social fabric and psychological well-being of the affected individuals. In order to mitigate the deleterious consequences of cyber threats, adoption of an intelligent agent-based solution to enhance the speed and comprehensiveness of cyber intelligence is advocated. In this paper, a novel cyber intelligence solution is proposed, employing four semantic agents that interact autonomously to acquire crucial cyber intelligence pertaining to any given country. The solution leverages a combination of techniques, including a convolutional neural network (CNN), sentiment analysis, exponential smoothing, latent Dirichlet allocation (LDA), term frequency-inverse document frequency (TF-IDF), Porter stemming, and others, to analyse data from both social media and web sources. The proposed method underwent evaluation from 13 October 2022 to 6 April 2023, utilizing a dataset comprising 37,386 tweets generated by 30,706 users across 54 languages. To address non-English content, a total of 8199 HTTP requests were made to facilitate translation. Additionally, the system processed 238,220 cyber threat data from the web. Within a remarkably brief duration of 6 s, the system autonomously generated a comprehensive cyber intelligence report encompassing 7 critical dimensions of cyber intelligence for countries such as Russia, Ukraine, China, Iran, India, and Australia.

1. Introduction

By the year 2020, the global cost incurred as a result of cyber-attacks had surpassed $1 trillion USD [1]. Projections indicate that by 2025, cybercrimes will have an annual cost of up to $10.5 trillion USD [2]. Notably, larger economies, including China, Brazil, the United States, India, Mexico, France, Australia, and the United Arab Emirates, face billions of dollars in consumer losses due to cybercrime [3]. For instance, in 2017, Chinese consumers experienced a financial loss of $66.3 billion USD attributed to cybercrime [3]. Moreover, apart from financial ramifications, cyber-attacks have profound social and psychological implications for individuals [4]. A recent example is the cyber-attack on Optus, a prominent Australian telecommunications company, which resulted in widespread stress and anger among the affected individuals [5,6].
As a consequence of this cyber-attack, sensitive information, such as names, dates of birth, email addresses, driver’s license details, Medicare cards, passport numbers, and more, may have been exposed, impacting 2.1 million people [6,7]. More recently, cybercriminals managed to obtain confidential patient data from Australia’s leading health insurance provider, which included medical diagnoses and procedures [8]. Thus, cybercrime stands as a paramount challenge for contemporary countries, states, corporations, and individuals.
To mitigate the effects of cybercrime, two crucial prerequisites are the availability of cyberspace-related data and advanced analytical algorithms capable of detecting and preventing threats. Recent studies [1,9] have emphasized the importance of cyber data. Such data can be acquired through actual network traffic analysis [10,11,12,13], simulation methods [14], surveys [15], open-source antivirus intelligence [16], and even social media data [17,18,19,20,21,22,23]. Among these various sources of cyber intelligence, open-source data and social media data from platforms such as Twitter have been identified as the most effective means of obtaining cyber data, as supported by recent research findings [16,17,18,19,20,21,22,23]. This paper introduces a novel multi-agent system designed to achieve the generation of multi-dimensional cyber intelligence through semantic communication among its constituent agents. The proposed agent effectively utilizes both social media data and web-based data to deliver a rigorous and comprehensive intelligence output to cyber analysts and strategists. Notably, the generation of cyber intelligence capitalizes on cutting-edge advancements in artificial intelligence (AI) and machine learning (ML) algorithms.
Figure 1 illustrates the schematic representation of the system wherein a user submits a request for cyber intelligence acquisition to the aggregation agent. The aggregation agent facilitates communication with two distinct agents, namely, the social media agent and the web media agent. The social media agent leverages various algorithms, including sentiment analysis, translation, term frequency analysis, topic modelling, and anomaly detection, to deliver cyber intelligence based on social media platforms, such as Twitter. These algorithms are implemented through both application programming interfaces (APIs) and non-API methods. The execution of API-based natural language processing (NLP) algorithms, such as sentiment analysis and translations, is carried out by a separate agent called the cognitive service agent.
In addition to acquiring cyber intelligence from social media sources, the web media agent also provides cyber threat intelligence by encompassing the spectrum of threats specific to selected countries, which is sourced from antivirus vendor websites, such as Kaspersky. Finally, the user is furnished with a comprehensive cyber intelligence report that amalgamates knowledge obtained through AI-driven acquisition from both social media and web media sources.
Table 1 presents a comprehensive overview of the distinctive features of the four unique designs proposed in this paper along with their corresponding meaningful outputs. Of particular significance is the semantic output generated for the user, encompassing various elements, such as country name, threat level, threat spectrum, geopolitical and socioeconomic factors, physiological and societal aspects, impacted targets, national concerns, and victimization. This comprehensive cyber intelligence output aligns with the prevailing literature on the subject matter [24,25,26,27,28,29,30].
In order to assess the efficacy of the newly devised social media-based cyber threat intelligence system, a comprehensive analysis was conducted on a dataset comprising 37,386 Tweets authored by 30,706 unique users. The data collection period spanned from 13 October 2022 to 6 April 2023. Throughout this timeframe, a diverse range of 54 languages was captured, processed, and subjected to in-depth analysis. The resulting outcome yielded multi-dimensional cyber threat intelligence specifically tailored for the countries of China, Russia, Ukraine, Australia, Iran, and India.

2. Background and Literature Review

This section lays the foundations for multi-dimensional cyber intelligence as well as existing literature on NLP.

2.1. Contextual Information on Multi-Dimensional Cyber Intelligence

Ref. [26] presents a comprehensive framework for categorizing and evaluating the impacts of cyber-attacks on individuals, organizations, and society while also highlighting current limitations and challenges in cyber harm research and offering recommendations for future studies. The paper examines four case studies to establish connections between various forms of harm and the potential spread of cyber violence, emphasizing the need for analytical tools for organizational cyber harm based on a proposed taxonomy. Ref. [27] identifies and analyses common cyber security vulnerabilities and their recurrence rates, including the affected publication venues, countries, and targeted infrastructures and applications. The report emphasizes the necessity for further research in identifying critical vulnerabilities and developing effective mitigation strategies, emphasizing the need for empirical validation and practical implementation. Ref. [28] underscores the significance of employee and organizational education, training, and awareness in preventing and minimizing cyber security incidents, outlining major obstacles, and offering a research framework in this area. Ref. [29] focuses on phishing attacks, examining previous strategies, assessing the current landscape, and proposing a comprehensive anatomy of phishing that covers attack phases, attacker types, weaknesses, threats, targets, attack media, and strategies with the aim of raising awareness and aiding the development of an anti-phishing system. Ref. [30] explores various perspectives on cyber security threats, including technical, human, organizational, and environmental factors, and proposes a methodology to examine these concerns and their interactions. The article suggests a multi-dimensional approach to identify underlying causes and develop more robust defences. Table 2 provides an overview of the research works (i.e., [24,25,26,27,28,29,30]) and the strategic questions they address along with the dimension of cyber threat.

2.2. NLP-Based Cyber Intelligence from Social Media

Analysing cyber-related social media posts on Twitter began nearly a decade ago [31], but these early studies did not leverage the potential of machine learning (ML) and deep learning (DL) techniques for automated critical analysis. Instead, they relied on the manual application of general statistical techniques to gain insights into cyberbullying and related issues. In [17], a methodology utilizing sentiment analysis of tweets is described for predicting cyber-attacks. The authors employed machine learning algorithms to categorize tweets as security-related, positive, or neutral and then examined the relationship between sentiment scores and cyber-attacks reported by Google News using a regularised regression model. Their findings suggest that this methodology can serve as a warning system for identifying potential cyber-attacks, particularly when the coefficient of determination is high. They also discuss the response of hacktivists to candidates’ comments and actions in the 2016 US presidential elections, providing examples. Ref. [21] utilizes the term frequency-inverse document frequency (TF-IDF) to extract features from a dataset of 2000 tweets, comparing the performance of five classifiers and determining that logistic regression is the best classifier for detecting bullying tweets. In [22], TF-IDF and LGBM algorithms are employed to identify cyber-attacks in darknet traffic, achieving a high accuracy of 98.97% compared to other algorithms. Ref. [19] employs two machine learning-based classifiers to analyse a large-scale Twitter dataset, revealing negative sentiment across various themes. Ref. [20] identifies sensitive keywords that could lead to vulnerability when combined with benchmarked cyber keywords using LDA, while [23] analyses COVID-19 misinformation spread on Twitter through sentiment, emotion, topic, and user attributes, highlighting the denial of the pandemic and dissemination of false information. Table 3 showcases the diverse NLP algorithms used in these studies, with the proposed system presenting the most comprehensive use of NLP-based algorithms for generating seven-dimensional cyber intelligence.

3. Materials and Methods

The proposed methodology incorporates the development of four distinct agents that interact with each other in a complex manner to fulfil the user’s goal of acquiring AI-based cyber intelligence across various countries. Figure 2 illustrates the integrated semantic multi-agent communication within this framework. Figure 2, Figure 3 and Figure 4 illustrate the building blocks and detailed sub-processes of the four agents. These sub-processes (i.e., language detection and translation, sentiment analysis, anomaly detection, term frequency generation, topic generation, and threat prediction) are explored here in further detail within Section 3.1, Section 3.2, Section 3.3, Section 3.4, Section 3.5, Section 3.6.
As seen in Figure 3, social media agent and cognitive service agent perform several NLP activities (e.g., translation, sentiment analysis, term frequency generation, and topic generation) and deep learning activity (i.e., anomaly detection) on Twitter data [16]. On the other hand, web media agent performs deep learning tasks, such as anomaly detection, and statistical modelling tasks, such as threat prediction, from web media data on cyber-attacks [18], as seen in Figure 4. Hence, the proposed architecture of multi-agent communication in semantic manner realizes the first multi-source cyber intelligence with NLP and AI via the amalgamation of both social media [16] and web data [18].

3.1. Language Detection and Translation Process

Microsoft’s Azure Cognitive Services empowers developers to build intelligent applications without the requirement of developing the underlying AI technology. It offers language translation and detection functionalities. Language detection is the process of identifying the language of a given text, and Azure Cognitive Services provides a Language Detection API capable of automatically determining the language of text inputs. This API supports various programming languages, such as C#, Java, Python, and more, and demonstrates remarkable accuracy in detecting over 120 different languages [32]. As seen from Algorithm 1 (line 1 to line 5), this process converts a multilingual stream of social media messages xi to yi, where i = 0 to N. Appendix A provides sample codes on the detailed usage of language detection and translation process.
Algorithm 1: Language Processing on Cyber-Related Social Media Messages
1:For each xi in N, Multilingual Social Media Messages
2:If Language(xi)<> ‘English’
3: yi = Translate(xi)
4: Else
5: yi = xi
6:For each yi in N, English Social Media Messages
7: si = Sentiment(yi)
8: If yi Contains ‘Country Name’
9: { c r , yi, t c r }= yi
10:For each cr in C, Countries
11: {Yes/No, t r } = AnomalyDetection(CountofMessagesonTimeUnit( t r ), t r )
12: {{ w j , f w j }, …} = TermFrequency(Tokenize(yi))
13: {{ w k , f w k }, …} = Stemming(Tokenize(yi))
14: {{ 1 l w l , f o }, …} = n_gram(Tokenize(yi))
15: {{ v p , {{ w q , f w q }…}}, …} = Topic(Tokenize(yi))
16:Generate Interactive Visualization

3.2. Sentiment Analysis Process

Sentiment analysis, the process of determining the emotional tone conveyed in a text, can be effectively performed using the Sentiment Analysis API provided by Azure Cognitive Services [32]. This API utilizes natural language processing techniques to analyse sentiments associated with phrases, entities, and linguistic constructions present in the input text. With a machine learning model trained on extensive text data, the API offers highly accurate sentiment analysis for various text inputs, including social media posts, customer reviews, and support issues. Users can access the API by registering for an Azure account and subscribing to the Sentiment Analysis API and integrating it into their applications through SDKs or REST APIs available for multiple programming languages. Alongside an overall sentiment score, the API provides detailed information, such as sentiment scores for individual sentences and key phrases/entities associated with each sentiment score. This information can be utilized in customer service, marketing, and social media monitoring to make informed decisions and gain deeper insights into expressed sentiments [17,19,33]. Appendix A provides sample codes on the detailed usage of sentiment analysis process.
si =Sentiment(yi)
S e n t i m e n t   C l a s s i f i c a t i o n = P o s i t i v e , i f   s i 0.7 N e g a t i v e , i f   s i 0.3 N e u t r a l , i f   0.3 > s i > 0.7

3.3. Anomaly Detection Process

The anomaly detector enhances line charts by automatically identifying anomalies in time-series data. It employs NLP-based root cause analysis to dynamically explain the detected anomalies [32,33,34]. In this section, the problem definition is discussed first. When a sequence of real values is presented, x = x 1 , x 2 , x 3 , , x n , time-series anomaly detection’s target becomes producing an output sequence of y = y 1 , y 2 , y , , y n , where y i { 0,1 } denotes whether x i is an anomaly point.
Research in [35] demonstrated the process of saliency reduction (SR) from visual saliency detection domain followed by application of CNN to the output of SR model. This study implements similar process as prescribed in [35] with the following three core tasks:
Apply Fourier transform for generating log amplitude spectrum.
Compute the SR.
Apply inverse Fourier transform for transforming the sequence back to the spatial domain.
A f = A m p l i t u d e f x
P f = P h r a s e f x
L f = l o g A f
A L f = h q f   ×   L ( f )
R f = L f A L ( f )
S x = | f 1 exp R f + i P f |
Here, Fourier transform and inverse Fourier transform are represented by f and f1, respectively. Moreover, x represents the input sequence with shape nX1, and A(f) represents amplitude spectrum of sequence x. Furthermore, phase spectrum of sequence x is denoted by P(f). Log representation of A(f) is represented here with L(f); then, average spectrum of L(f) is presented with AL(f), which can be estimated by convoluting the input sequence by hq(f). Here, hq(f) can be presented with a q × q matrix as shown in Equation (9).
h q f = 1 q 2 1 1 1 1 1 1 1 1 1 1
As shown in Equation (7), R(f) is calculated by subtracting the averaged log spectrum AL(f) from the log spectrum L(f). Here, SR is denoted with R(f). Finally, as shown in Equation (8), by applying an inverse Fourier transform, the sequence was assigned back to the spatial domain. The final output sequence S(x) represented within Equation (9) is called the saliency map [36]. The anomaly points are computed with Equation (10).
x = x ¯ + m e a n 1 + v a r   ×   r + x
Within Equation (10), the local average of the preceding points is represented by x ¯ . On the other hand, within Equation (10), mean and var are the mean and variance of all points within the current sliding window (i.e., randomly sampled rN (0, 1)). In this manner, CNN is employed on the saliency map (i.e., not on the raw input). The procedure of anomaly detection thereby maximizes efficacy and efficiency [35,36]. As previously indicated, the anomaly identification method used in this study makes use of NLP to provide plain English explanations of the origins of all the anomalies [32].

3.4. Term Frequency Generation Process

The text analysis method known as term frequency-inverse document frequency (TF-IDF) is used to determine the relative value of words in a document [17,19,20,21,22]. A word’s frequency in a document is multiplied by the word’s inverse frequency over the entire corpus to arrive at this number (inverse document frequency). The resulting score gives terms that are significant in a specific document but uncommon across the entire corpus greater weight. As seen from line 12 of Algorithm 1, the TF-IDF process generates a vector of words along with frequencies of each word as {{ w j , f w j }, …}. Here, w j is a word, and f w j is the corresponding frequency.
Porter stemming is a method for condensing words to their root or fundamental form [17]. It normalizes words by deleting frequent suffixes from them. For instance, the roots of the words “running”, “runs”, and “run” would all be the same. By lowering the number of unique terms and combining similar words, this method can increase the accuracy of text analysis. Line 13 of Algorithm 1 demonstrates the creation of vector {{ w k , f w k }, …}, where w k is the root word and f w k is the corresponding frequency.
In a text, n-grams are continuous groups of n words [17,19,20]. They are frequently employed in text analysis to record linguistic context and structure. For instance, trigrams (n = 3) can capture more complicated word sequences, such as “deep learning algorithms”, while bigrams (n = 2) can capture pairs of words that frequently occur together, such as “machine learning”. For text classification, clustering, and other text analysis tasks, n-grams can be utilized to produce features. Line 14 of Algorithm 1 shows unions of words (i.e., 1 l w l ) and their corresponding frequency, fo. For bigrams, it is w 1 w 2 , and for trigrams, it is w 1 w 2 w 3 . TF-IDF, Porter stemming, and n-grams can all be used in conjunction to preprocess and analyse text data, which can enhance the precision and efficiency of text analysis operations.

3.5. Topic Generation Process

A method for locating the key themes or subjects in a group of texts is called topic modelling. Latent Dirichlet allocation (LDA) is one of the most well-liked topic modelling algorithms [19,20]. According to the LDA statistical model, each document in a collection is composed of a few different themes, and each word is produced by one of those topics. In order for the LDA algorithm to function, each word in each document is first given a topic at random. The subject assignments are then incrementally updated based on the words found in the texts until a set of topic assignments is found that most effectively explains the words found. The LDA method can be used to study and investigate the content of the documents in a variety of ways after it has determined the subjects in the document collection [19,20]. It can be used, for instance, to find the subjects that appear most frequently in the collection, to investigate the relationships between the topics, and to find the documents that are most closely related to each topic. LDA has a wide range of applications in areas including natural language processing, text mining, and information retrieval and is a potent tool for revealing latent subjects in a collection of texts. As shown in line 15 of Algorithm 1, topics v p and a list of words along with corresponding frequencies of the topic { w q , f w q } are presented by the LDA algorithm.

3.6. Threat Prediction Process

The study in [37] utilizes exponential smoothing, a statistical method, to predict malware attack frequency. It employs mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) to evaluate single and double exponential smoothing models. The results indicate that single exponential smoothing is effective in predicting malware attacks, and therefore this technique is also adopted here. The only smoothing parameter used in this single exponential model is α . The fundamental premise guiding this single exponential smoothing is that the data’s mean is steady while the actual data fluctuates around it. Thus, a single exponential smoothing model is represented in Equation (11). The double exponential smoothing is an expansion of the single exponential smoothing. The double exponential smoothing method uses two parameter constants, α and β . The smoothing constant α is used to smooth the level value estimate, and β is used to smooth the trend value estimate. Double exponential smoothing is employed when there is evidence of a trend in the data but no seasonality effect. It is similar to single exponential smoothing, but α and β , which are the level and trend smoothing parameters, must be changed every time. Equation (11) represents double exponential smoothing, Equation (12) shows the trend estimate, and Equation (13) represents forecast in m step ahead.
F t + 1 = α y t + ( 1 α ) F t
L t = α y t + ( 1 α ) ( L t 1 + b t 1 )
b t = β L t L t 1 + ( 1 β ) b t 1
F t + m = L t + m b t
Here, F t + 1 (in Equation (11)) is the forecast for the next period, and F t is the old forecast for period t. On the other hand, y t is the actual value at t. Within Equation (12) L t is the estimate for the level of the time series at time = smoothing constant for the data, b is smoothing constant for trend estimate,  b t is the estimate of the slope of the series at time t, and m represents the periods to be forecast into the future.
Earlier in this section, Figure 2, Figure 3 and Figure 4 presented the building blocks of the prosed systems. These building blocks are subsequently described and summarized in Table 4. Each of these building blocks presented within this section was implemented with Algorithm 1.

4. Results

The proposed method was evaluated from 13 October 2022 to 6 April 2023 using a dataset of 37,386 tweets from 30,706 users in 54 languages. These tweets were obtained using Microsoft Power Automate as demonstrated in Figure 5. Acquiring tweets through Power Automate involves leveraging its capabilities to integrate with external platforms, such as Twitter’s Application Programming Interface (API), as demonstrated in our research [16,33,34]. Power Automate, a cloud-based service provided by Microsoft, allows users to automate workflows and create custom applications without requiring extensive coding knowledge. To obtain tweets, one must first authenticate their Power Automate connection with the Twitter API by generating API credentials and obtaining an access token. This process involves registering a Twitter developer account, creating an application, and obtaining the necessary keys and tokens. Once authenticated, users can utilize Power Automate’s “HTTP” action to send requests to the Twitter API’s endpoints, such as the “search/tweets” endpoint. By specifying relevant parameters, such as search keywords, date range, or user handles, users can retrieve specific tweets or perform comprehensive searches. Power Automate can then process and manipulate the received tweet data, enabling users to store them in a database, send notifications, perform sentiment analysis, or integrate them into other applications, thereby empowering users to streamline their workflows and leverage Twitter’s vast data resources. It should be mentioned that keywords such as “Cyber” and “Hack” were used to obtain the real-time tweets from 13 October 2022 to 6 April 2023. All tweets (regardless of their relevancy) were analysed in the proposed system.
A total of 8199 HTTP requests were made to translate non-English tweets. These data were processed by a social media agent. Apart from the social media data, 238,220 cyber threat data were obtained and processed by the web media agent during the same timeframe (13 October 2022 to 6 April 2023). As seen from Table 5, the web media agent collected about 30K attack statistics for each of the eight different attack types (e.g., exploits, local infections, malicious mail, network attacks, on-demand scans, ransomware, spam, and web threats). Kaspersky is a trusted provider of cyber threat statistics and follows multi-dimensional cyber-attack data obtained by web media agents with web scraping techniques:
It should be highlighted that all these multi-dimensional cyber threat statistics from Kaspersky’s attack statistics (i.e., https://statistics.securelist.com/) (accessed on 3 March 2023) site provided a daily dump of threat statistics. The cloud-based web-media agent built with Microsoft Power Automate downloaded these statistics on a daily schedule and saved these statistics within Microsoft Dataverse, as demonstrated in Figure 5. The low-code implementation of the technique is also demonstrated in [18].
During the evaluation, the deployment structure portrayed in Figure 5 was used. Could-based Microsoft Power Platform and Microsoft Azure ecosystem were used for deploying the agents. Hence, industry-standard configurations were adopted for agent deployment. Agents in Microsoft Azure are deployed through Azure Automation, a cloud-based service facilitating the automation and orchestration of tasks across Azure resources and external systems. To initiate agent deployment, an Azure Automation account is created as the central management hub. Subsequently, the Azure Automation agent is installed on the target machine or virtual machine, enabling the execution of automation runbooks and establishing secure communication between the agent and the automation service. Configuration of agent settings, such as defining runbook worker groups, proxy settings, network access control rules, and resource management, follows the agent’s successful connection to the Azure Automation account. By assigning runbooks to the deployed agent within the Azure Automation account, desired automation tasks or workflows can be executed on the target machine. This deployment process empowers users to streamline their Azure resources through effective task automation and orchestration. As seen from Figure 5, technology building blocks, such as Microsoft SQL Server, Microsoft Dataverse, Microsoft Power Automate, and Microsoft Power BI, were used to generate AI-based cyber intelligence from both Twitter data and Kaspersky’s attack statistics (i.e., https://statistics.securelist.com/) (accessed on 3 March 2023).
Table 6 presents comprehensive information on the Twitter data collected over a 7-month period. It includes the time frame, number of tweets, unique users, unique locations, unique languages, total retweets, average confidence scores for negative, neutral, and positive sentiment analyses, and the number of translated tweets. The table reveals an increase in the quantity of tweets and users over time, while the number of languages remained stable. Retweet counts varied each month, with November 2022 having the highest total. The average confidence scores for sentiment analyses remained stable, with negative sentiment showing higher confidence than neutral and positive sentiment. Only a small portion of tweets were translated, with December 2022 having the highest percentage. Figure 6 displays average daily sentiment score fluctuations on time-series data as produced by the cognitive service agent. Figure 7 focuses solely on average monthly negative sentiments for Russia, China, Australia, Ukraine, India, and Iran. As highlighted in Figure 7, the monthly average of negative sentiment for Russia was at its peak in January 2023 with an average negative sentiment confidence of 0.74. Negative sentiment cyber-related posts are considered alerts [16]. These alerts or negative sentiments produced by the cognitive service agent were used to generate the psychological effect (i.e., number 5 semantic output of the aggregation agent shown in Table 1) of the cyber threat dimension.
The CNN-based anomaly detection algorithm identified 7 anomalies for Russia, 5 anomalies for India, 5 anomalies for Iran, 4 anomalies for Ukraine, 4 anomalies for China, and finally, 12 anomalies for Australia, as depicted in Figure 8. These anomalies were identified by the social media agent by analysing live tweets during the monitored period. It should be mentioned that the number of tweets for each of the countries varied every month depending on the number of cyber-related issues and how the general public reacted psychologically to those issues. Figure 9 demonstrates the ranking of the number of tweets for each of the monitored countries between 13 October 2022 and 6 April 2023. As seen from Figure 9, in January 2023, the number of tweets for Russia was highest followed by Ukraine, India, Iran, China, and Australia. These statistics were slightly different in other months. However, cyber-related posts for Russia remained the focal point of discussion among Twitter users.
These tweets were analysed subsequently using term frequency (as shown in Table 7) and topic modelling (as shown in Table 8 and Table 9).
While Figure 6, Figure 7, Figure 8 and Figure 9 represent the analysis of the social media agent and cognitive service agent from tweets, Figure 10 shows the actual deployment of the web media agent within a deployed mobile app (within Samsung Galaxy Note 10 Light mobile) analysing actual cyber-attack statistics from the web. As seen in Figure 10, the web media agent provided a ranking of the most cyber-attacked countries, and a user selected China (with an average rank factor of 67.74). Using this agent, it was identified that China was the most threatened country followed by Russia, Ukraine, India, Iran, and Australia (as shown in Figure 10). This information (of threat ranking) is used for the final assessment of “Threat Level” (second semantic output of the aggregation agent, as shown in Table 1).
Similarly, Figure 11 shows another deployment of a web media agent showing the cyber threat spectrum of the selected countries. As seen in Figure 11, the major cyber threat types for China, Russia, and India were spam. For Ukraine, Iran, and Australia, the threat types were deemed to be local infection, on-demand scans, and web threats, respectively. This information (i.e., shown in Figure 11) is subsequently used in explaining the “Threat Spectrum” by the aggregation agent (i.e., third semantic output of the aggregation agent, as shown in Table 1). Finally, Figure 12 shows the web media agent analysing global cyber threats during the monitored period. Figure 12a shows the cyber threat prediction for Australia using exponential smoothing (described earlier in Section 3.6). Figure 12b demonstrates a detailed threat spectrum analysis for Russia. As seen in Figure 12b, in terms of spam, Russia appeared to be the third most attacked country in the world during the evaluation period. This information (i.e., shown in Figure 12b) is also used in explaining the “Threat Spectrum” by the aggregation agent (i.e., third semantic output of the aggregation agent, as shown in Table 1).

5. Discussion and Concluding Remarks

Even with multiple research mandates on developing innovative cyber threat agents, there have not been significant cyber agents reported in the existing literature [38]. This research developed a new agent-based cyber intelligence solution by deploying four independent agents. In this section, performance evaluation was performed in terms of how fast the solution is and how comprehensive the autonomous cyber report is.
The evaluation of the “fast” dimension is provided in Table 10. As seen in Table 10, the proposed solution with the aggregation agent, social media agent, cognitive service agent, and web media agent promoted the fastest response time.
Table 11 provides an evaluation with respect to a comprehensive set of dimensions, such as threat level, threat spectrum, geopolitical/socioeconomic, psychological, impacted target, national concern, and victimization. Obtaining AI-driven information on these comprehensive lists of dimensions was the goal of the aggregation agent, as outlined in the semantic output (in Table 1). Interestingly, with the proposed configuration of agents, it is possible to obtain comprehensive cyber intelligence on any country within just 6 s (on average). It is imperative to emphasize that none of the extant solutions, namely, references [17,18,19,20,21,22,23], have achieved the level of comprehensive cyber intelligence by country as exemplified in this study. Table 11 serves as a demonstration of the proposed agent-based cyber intelligence solution’s capacity and capability to generate thorough cyber intelligence pertaining to various countries, such as China, India, Russia, Ukraine, Australia, and Iran. The selection of these countries was random in nature, as the proposed system possesses the inherent ability to generate cyber intelligence for any country across the globe, contingent upon the availability of social media users discussing said country.
In spite of proposing a novel and autonomous AI-driven semantic cyber agent, this study encounters several limitations and potential drawbacks.
  • Firstly, the proposed approach assumed that all 37,386 cyber-related tweets were relevant. However, it is evident from the data presented in Table 12 that not all 37,386 tweets could be classified as cyber-related. Employing the confusion matrix depicted in Table 12, an array of performance evaluation criteria encompassing precision, recall, sensitivity, specificity, F1-score, accuracy, and others were computed and documented in Table 13. Upon comparing the performance of the proposed approach with existing research in the realm of social media-based cyber intelligence, it becomes apparent, as indicated in Table 14, that a few extant studies, specifically [17] and [21], outperformed the proposed approach in certain instances. Nonetheless, it is worth noting that the proposed approach exhibits superior performance compared to the majority of existing solutions documented in the literature. On average, the F1-score achieved by the prevailing methodologies was observed to be 0.83, whereas the proposed solution showcased a significantly higher F1-score of 0.88.
  • Secondly, the proposed approach disregarded the possibility that these tweets could have originated from counterfeit accounts [39] or that genuine Twitter users may disseminate false information [40].
  • Thirdly, this study relies on real-time tweet API, Microsoft Power Platform, and Microsoft Azure, all of which necessitate regular payment through credit cards. For instance, access to the basic Twitter API with a monthly limit of reading only 10K Tweets incurs a cost of $100 USD per month [41]. Increasing this limit to read 1 million tweets could result in a financial commitment of $5000 USD per month [41]. Consequently, in order to minimize expenses, this research examined only a limited number of tweets. Researchers interested in working with real-time tweets must possess access to credit cards and sufficient research funds to sustain the ongoing subscription costs.
  • Fourthly, this research extensively employed “black box” cloud-based services and tools, such as Microsoft Cognitive Services, which poses substantial challenges in investigating algorithmic biases and potential enhancements.
  • Lastly, this investigation employed industry standard tools and cutting-edge cloud services, including Microsoft Power Platform and Microsoft Azure. Therefore, conducting this research necessitates expertise and certifications in these technologies and standards.
In the future, our research endeavours will focus on the integration of advanced algorithms designed to identify and discern fake users, as well as counterfeit information, thereby facilitating the acquisition of more robust and validated cyber intelligence. Furthermore, we shall dedicate our efforts to further exploration of open-source tools and algorithms, aiming to mitigate cost obligations, enhance performance metrics (specifically in terms of F1-score), and foster a greater sense of technological autonomy.

Funding

This research received no external funding.

Data Availability Statement

Data will be made available upon request.

Acknowledgments

The author would like to acknowledge the feedback provided by Edris Alam of Disaster Resilience and Emergency Management, Rabdan Academy, Abu Dhabi, UAE.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

This section provides the Python code for implementing language detection and translation, sentiment analysis, and anomaly detection using the Microsoft Cognitive Services API.

Appendix A.1. Language Detection and Translation

Appendix A.1.1. Python Code Sample

import requests, uuid, json
# Add your key and endpoint
key = “<YOUR-TRANSLATOR-KEY>”
# location, also known as region.
# required if you’re using a multi-service or regional (not global) resource. It can be found in the Azure portal on the Keys and Endpoint page.
location = “<YOUR-RESOURCE-LOCATION>”
path = ‘/translate’
constructed_url = endpoint + path
params = {
    ‘api-version’: ‘3.0’,
    ‘to’: [‘en’, ‘it’]
}
headers = {
    ‘Ocp-Apim-Subscription-Key’: key,
    # location required if you’re using a multi-service or regional (not global) resource.
    ‘Ocp-Apim-Subscription-Region’: location,
    ‘Content-type’: ‘application/json’,
    ‘X-ClientTraceId’: str(uuid.uuid4())
}
# You can pass more than one object in body.
body = [{
    ‘text’: ‘Halo, rafiki! Ulifanya nini leo?’
}]
request = requests.post(constructed_url, params=params, headers=headers, json=body)
response = request.json()
print(json.dumps(response, sort_keys=True, ensure_ascii=False, indent=4, separators=(‘,’, ‘: ‘)))

Appendix A.1.2. Sample Output

[
   {
     “detectedLanguage”:{
         “language”:”sw”,
         “score”:0.8
     },
     “translations”:[
         {
            “text”:”Hello friend! What did you do today?”,
            “to”:”en”
         },
         {
            “text”:”Ciao amico! Cosa hai fatto oggi?”,
            “to”:”it”
         }
     ]
   }
]

Appendix A.2. Sentiment Analysis

Appendix A.2.1. Python Code Sample

# This example requires environment variables named “LANGUAGE_KEY” and “LANGUAGE_ENDPOINT”
language_key = os.environ.get(‘LANGUAGE_KEY’)
language_endpoint = os.environ.get(‘LANGUAGE_ENDPOINT’)
  
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
  
# Authenticate the client using your key and endpoint
def authenticate_client():
    ta_credential = AzureKeyCredential(language_key)
    text_analytics_client = TextAnalyticsClient(
            endpoint=language_endpoint,
            credential=ta_credential)
    return text_analytics_client
client = authenticate_client()
# Example method for detecting sentiment and opinions in text
def sentiment_analysis_with_opinion_mining_example(client):
    documents = [
        “The food and service were unacceptable. The concierge was nice, however”.
    ]
    result = client.analyze_sentiment(documents, show_opinion_mining=True)
    doc_result = [doc for doc in result if not doc.is_error]
    positive_reviews = [doc for doc in doc_result if doc.sentiment == “positive”]
    negative_reviews = [doc for doc in doc_result if doc.sentiment == “negative”]
    positive_mined_opinions = []
    mixed_mined_opinions = []
    negative_mined_opinions = []
    for document in doc_result:
        print(“Document Sentiment: {}”.format(document.sentiment))
        print(“Overall scores: positive={0:.2f}; neutral={1:.2f}; negative={2:.2f} \n”.format(
            document.confidence_scores.positive,
            document.confidence_scores.neutral,
            document.confidence_scores.negative,
        ))
        for sentence in document.sentences:
            print(“Sentence: {}”.format(sentence.text))
            print(“Sentence sentiment: {}”.format(sentence.sentiment))
            print(“Sentence score:\nPositive={0:.2f}\nNeutral={1:.2f}\nNegative={2:.2f}\n”.format(
                sentence.confidence_scores.positive,
                sentence.confidence_scores.neutral,
                sentence.confidence_scores.negative,
            ))
            for mined_opinion in sentence.mined_opinions:
                target = mined_opinion.target
                print(“......’{}’ target ‘{}’”.format(target.sentiment, target.text))
                print(“......Target score:\n......Positive={0:.2f}\n......Negative={1:.2f}\n”.format(
                    target.confidence_scores.positive,
                    target.confidence_scores.negative,
                ))
                for assessment in mined_opinion.assessments:
                    print(“......’{}’ assessment ‘{}’”.format(assessment.sentiment, assessment.text))
                    print(“......Assessment score:\n......Positive={0:.2f}\n......Negative={1:.2f}\n”.format(
                        assessment.confidence_scores.positive,
                        assessment.confidence_scores.negative,
                         ))
            print(“\n”)
        print(“\n”)
        
sentiment_analysis_with_opinion_mining_example(client)

Appendix A.2.2. Sample Output

Document Sentiment: mixed
Overall scores: positive=0.47; neutral=0.00; negative=0.52
Sentence: The food and service were unacceptable.
Sentence sentiment: negative
Sentence score:
Positive=0.00
Neutral=0.00
Negative=0.99
......’negative’ target ‘food’
......Target score:
......Positive=0.00
......Negative=1.00
......’negative’ assessment ‘unacceptable’
......Assessment score:
......Positive=0.00
......Negative=1.00
......’negative’ target ‘service’
......Target score:
......Positive=0.00
......Negative=1.00
......’negative’ assessment ‘unacceptable’
......Assessment score:
......Positive=0.00
......Negative=1.00
Sentence: The concierge was nice, however.
Sentence sentiment: positive
Sentence score:
Positive=0.94
Neutral=0.01
Negative=0.05
......’positive’ target ‘concierge’
......Target score:
......Positive=1.00
......Negative=0.00
......’positive’ assessment ‘nice’
......Assessment score:
......Positive=1.00
......Negative=0.00

Appendix A.3. Anomaly Detection

Appendix A.3.1. Python Code

import time
from datetime import datetime, timezone
from azure.ai.anomalydetector import AnomalyDetectorClient
from azure.core.credentials import AzureKeyCredential
from azure.ai.anomalydetector.models import *
SUBSCRIPTION_KEY = os.environ[‘ANOMALY_DETECTOR_API_KEY’]
ANOMALY_DETECTOR_ENDPOINT = os.environ[‘ANOMALY_DETECTOR_ENDPOINT’]
ad_client = AnomalyDetectorClient(ANOMALY_DETECTOR_ENDPOINT, AzureKeyCredential(SUBSCRIPTION_KEY))
time_format = “%Y-%m-%dT%H:%M:%SZ”
blob_url = “Path-to-sample-file-in-your-storage-account” # example path: https://docstest001.blob.core.windows.net/test/sample_data_5_3000.csv
train_body = ModelInfo(
    data_source=blob_url,
    start_time=datetime.strptime(“2021-01-02T00:00:00Z”, time_format),
    end_time=datetime.strptime(“2021-01-02T05:00:00Z”, time_format),
    data_schema=“OneTable”,
    display_name=“sample”,
    sliding_window=200,
    align_policy=AlignPolicy(
        align_mode=AlignMode.OUTER,
        fill_n_a_method=FillNAMethod.LINEAR,
        padding_value=0,
    ),
)
batch_inference_body = MultivariateBatchDetectionOptions(
       data_source=blob_url,
       top_contributor_count=10,
       start_time=datetime.strptime(“2021-01-02T00:00:00Z”, time_format),
       end_time=datetime.strptime(“2021-01-02T05:00:00Z”, time_format),
   )
print(“Training new model...(it may take a few minutes)”)
model = ad_client.train_multivariate_model(train_body)
model_id = model.model_id
print(“Training model id is {}”.format(model_id))
  
## Wait until the model is ready. It usually takes several minutes
model_status = None
model = None
while model_status != ModelStatus.READY and model_status != ModelStatus.FAILED:
    model = ad_client.get_multivariate_model(model_id)
    print(model)
    model_status = model.model_info.status
    print(“Model is {}”.format(model_status))
    time.sleep(30)
if model_status == ModelStatus.READY:
    print(“Done.\n--------------------”)
    # Return the latest model id
# Detect anomaly in the same data source (but a different interval)
result = ad_client.detect_multivariate_batch_anomaly(model_id, batch_inference_body)
result_id = result.result_id
# Get results (may need a few seconds)
r = ad_client.get_multivariate_batch_detection_result(result_id)
print(“Get detection result...(it may take a few seconds)”)
while r.summary.status != MultivariateBatchDetectionStatus.READY and r.summary.status != MultivariateBatchDetectionStatus.FAILED and r.summary.status !=MultivariateBatchDetectionStatus.CREATED:
    anomaly_results = ad_client.get_multivariate_batch_detection_result(result_id)
    print(“Detection is {}”.format(r.summary.status))
    time.sleep(5)
    
   
print(“Result ID:\t”, anomaly_results.result_id)
print(“Result status:\t”, anomaly_results.summary.status)
print(“Result length:\t”, len(anomaly_results.results))
  
# See detailed inference result
for r in anomaly_results.results:
    print(
        “timestamp: {}, is_anomaly: {:<5}, anomaly score: {:.4f}, severity: {:.4f}, contributor count: {:<4d}”.format(
            r.timestamp,
            r.value.is_anomaly,
            r.value.score,
            r.value.severity,
            len(r.value.interpretation) if r.value.is_anomaly else 0,
        )
    )
    if r.value.interpretation:
        for contributor in r.value.interpretation:
            print(
                “\tcontributor variable: {:<10}, contributor score: {:.4f}”.format(
                    contributor.variable, contributor.contribution_score
                )
            )

Appendix A.3.2. Sample Output

10 available models before training.
Training new model...(it may take a few minutes)
Training model id is 3a695878-a88f-11ed-a16c-b290e72010e0
{‘modelId’: ‘3a695878-a88f-11ed-a16c-b290e72010e0’, ‘createdTime’: ‘2023-02-09T15:34:23Z’, ‘lastUpdatedTime’: ‘2023-02-09T15:34:23Z’, ‘modelInfo’: {‘dataSource’: ‘https://docstest001.blob.core.windows.net/test/sample_data_5_3000 (1).csv’, ‘dataSchema’: ‘OneTable’, ‘startTime’: ‘2021-01-02T00:00:00Z’, ‘endTime’: ‘2021-01-02T05:00:00Z’, ‘displayName’: ‘sample’, ‘slidingWindow’: 200, ‘alignPolicy’: {‘alignMode’: ‘Outer’, ‘fillNAMethod’: ‘Linear’, ‘paddingValue’: 0.0}, ‘status’: ‘CREATED’, ‘errors’: [], ‘diagnosticsInfo’: {‘modelState’: {‘epochIds’: [], ‘trainLosses’: [], ‘validationLosses’: [], ‘latenciesInSeconds’: []}, ‘variableStates’: []}}}
Model is CREATED
{‘modelId’: ‘3a695878-a88f-11ed-a16c-b290e72010e0’, ‘createdTime’: ‘2023-02-09T15:34:23Z’, ‘lastUpdatedTime’: ‘2023-02-09T15:34:55Z’, ‘modelInfo’: {‘dataSource’: ‘https://docstest001.blob.core.windows.net/test/sample_data_5_3000 (1).csv’, ‘dataSchema’: ‘OneTable’, ‘startTime’: ‘2021-01-02T00:00:00Z’, ‘endTime’: ‘2021-01-02T05:00:00Z’, ‘displayName’: ‘sample’, ‘slidingWindow’: 200, ‘alignPolicy’: {‘alignMode’: ‘Outer’, ‘fillNAMethod’: ‘Linear’, ‘paddingValue’: 0.0}, ‘status’: ‘READY’, ‘errors’: [], ‘diagnosticsInfo’: {‘modelState’: {‘epochIds’: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100], ‘trainLosses’: [1.0493712276220322, 0.5454281121492386, 0.42524269968271255, 0.38019897043704987, 0.3472398854792118, 0.34301353991031647, 0.3219067454338074, 0.3108387663960457, 0.30357857793569565, 0.29986055195331573], ‘validationLosses’: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], ‘latenciesInSeconds’: [0.3412797451019287, 0.25798678398132324, 0.2556419372558594, 0.3165152072906494, 0.2748451232910156, 0.26111531257629395, 0.2571413516998291, 0.257282018661499, 0.2549862861633301, 0.25806593894958496]}, ‘variableStates’: [{‘variable’: ‘series_0’, ‘filledNARatio’: 0.0, ‘effectiveCount’: 301, ‘firstTimestamp’: ‘2021-01-02T00:00:00Z’, ‘lastTimestamp’: ‘2021-01-02T05:00:00Z’}, {‘variable’: ‘series_1’, ‘filledNARatio’: 0.0, ‘effectiveCount’: 301, ‘firstTimestamp’: ‘2021-01-02T00:00:00Z’, ‘lastTimestamp’: ‘2021-01-02T05:00:00Z’}, {‘variable’: ‘series_2’, ‘filledNARatio’: 0.0, ‘effectiveCount’: 301, ‘firstTimestamp’: ‘2021-01-02T00:00:00Z’, ‘lastTimestamp’: ‘2021-01-02T05:00:00Z’}, {‘variable’: ‘series_3’, ‘filledNARatio’: 0.0, ‘effectiveCount’: 301, ‘firstTimestamp’: ‘2021-01-02T00:00:00Z’, ‘lastTimestamp’: ‘2021-01-02T05:00:00Z’}, {‘variable’: ‘series_4’, ‘filledNARatio’: 0.0, ‘effectiveCount’: 301, ‘firstTimestamp’: ‘2021-01-02T00:00:00Z’, ‘lastTimestamp’: ‘2021-01-02T05:00:00Z’}]}}}
Model is READY
Done.
--------------------
10 available models after training.
Get detection result...(it may take a few seconds)
Detection is CREATED
Detection is READY
Result ID: 70a6cdf8-a88f-11ed-a461-928899e62c38
Result status: READY
Result length: 301
timestamp: 2021-01-02 00:00:00+00:00, is_anomaly: 0    , anomaly score: 0.1770, severity: 0.0000, contributor count: 0   
timestamp: 2021-01-02 00:01:00+00:00, is_anomaly: 0    , anomaly score: 0.3446, severity: 0.0000, contributor count: 0   
timestamp: 2021-01-02 00:02:00+00:00, is_anomaly: 0    , anomaly score: 0.2397, severity: 0.0000, contributor count: 0   
timestamp: 2021-01-02 00:03:00+00:00, is_anomaly: 0    , anomaly score: 0.1270, severity: 0.0000, contributor count: 0   
timestamp: 2021-01-02 00:04:00+00:00, is_anomaly: 0    , anomaly score: 0.3321, severity: 0.0000, contributor count: 0   
timestamp: 2021-01-02 00:05:00+00:00, is_anomaly: 0    , anomaly score: 0.4053, severity: 0.0000, contributor count: 0   
timestamp: 2021-01-02 00:06:00+00:00, is_anomaly: 0    , anomaly score: 0.4371, severity: 0.0000, contributor count: 0   
timestamp: 2021-01-02 00:07:00+00:00, is_anomaly: 1    , anomaly score: 0.6615, severity: 0.3850, contributor count: 5   
contributor variable: series_3 , contributor score: 0.2939
contributor variable: series_1 , contributor score: 0.2834
contributor variable: series_4 , contributor score: 0.2329
contributor variable: series_0 , contributor score: 0.1543
contributor variable: series_2 , contributor score: 0.0354

References

  1. Cremer, F.; Sheehan, B.; Fortmann, M.; Kia, A.N.; Mullins, M.; Murphy, F.; Materne, S. Cyber risk and cybersecurity: A systematic review of data availability. Geneva Pap. Risk Insur. Issues Pract. 2022, 47, 698–736. [Google Scholar] [CrossRef] [PubMed]
  2. Cybercrime Magazine. Cybercrime to Cost The World $10.5 Trillion Annually by 2025. 2020. Available online: https://cybersecurityventures.com/hackerpocalypse-cybercrime-report-2016/ (accessed on 15 October 2022).
  3. Statista Research Department. Consumer Loss through Cyber Crime Worldwide in 2017, by Victim Country. 2022. Available online: https://www.statista.com/statistics/799875/countries-with-the-largest-losses-through-cybercrime/ (accessed on 26 October 2022).
  4. Bada, M.; Nurse, J.R. Chapter 4—The social and psychological impact of cyberattacks. In Emerging Cyber Threats and Cognitive Vulnerabilities; Benson, V., Mcalaney, J., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 73–92. [Google Scholar]
  5. BBC. “News: Optus: How a Massive Data Breach Has Exposed Australia”. 2022. Available online: https://www.bbc.com/news/world-australia-63056838 (accessed on 16 October 2022).
  6. Australian Securities & Investments Commissions. Guidance for Consumers Impacted by the Optus Data Breach. 2022. Available online: https://asic.gov.au/about-asic/news-centre/news-items/guidance-for-consumers-impacted-by-the-optus-data-breach/ (accessed on 19 October 2022).
  7. Merritt, K. OPTUS Confirms 2.1 Million Customers Affected by Cyberattack, Total Telecom. 2022. Available online: https://totaltele.com/optus-confirms-2-1-million-customers-affected-by-cyberattack/ (accessed on 23 October 2022).
  8. Kaye, B. Australia’s No. 1 Health Insurer Says Hacker Stole Patient Details, Reuters. 2022. Available online: https://www.reuters.com/technology/after-telco-hack-australia-faces-wave-data-breaches-2022-10-20/ (accessed on 25 October 2022).
  9. Zibak, A.; Simpson, A. Cyber Threat Information Sharing: Perceived Benefits and Barriers. In ARES’19, Proceedings of the 14th International Conference on Availability, Reliability and Security, Canterbury, UK, 26–29 August 2019; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar]
  10. Xu, S.; Qian, Y.; Hu, R.Q. Data-Driven Network Intelligence for Anomaly Detection. IEEE Netw. 2019, 33, 88–95. [Google Scholar] [CrossRef]
  11. Keshk, M.; Sitnikova, E.; Moustafa, N.; Hu, J.; Khalil, I. An Integrated Framework for Privacy-Preserving Based Anomaly Detection for Cyber-Physical Systems. IEEE Trans. Sustain. Comput. 2021, 6, 66–79. [Google Scholar] [CrossRef]
  12. Ten, C.-W.; Hong, J.; Liu, C.-C. Anomaly Detection for Cybersecurity of the Substations. IEEE Trans. Smart Grid 2011, 2, 865–873. [Google Scholar] [CrossRef]
  13. Yang, J.; Zhou, C.; Yang, S.; Xu, H.; Hu, B. Anomaly Detection Based on Zone Partition for Security Protection of Industrial Cyber-Physical Systems. IEEE Trans. Ind. Electron. 2017, 65, 4257–4267. [Google Scholar] [CrossRef] [Green Version]
  14. Shi, D.; Guo, Z.; Johansson, K.H.; Shi, L. Causality Countermeasures for Anomaly Detection in Cyber-Physical Systems. IEEE Trans. Autom. Control. 2018, 63, 386–401. [Google Scholar] [CrossRef]
  15. Khan, N.F.; Ikram, N.; Saleem, S.; Zafar, S. Cyber-security and risky behaviors in a developing country context: A Pakistani perspective. Secur. J. 2022, 1–33. Available online: https://link.springer.com/content/pdf/10.1057/s41284-022-00343-4.pdf (accessed on 23 March 2023).
  16. Sufi, F. A New Social Media-Driven Cyber Threat Intelligence. Electronics 2023, 12, 1242. [Google Scholar] [CrossRef]
  17. Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, K.; Martinez-Hernandez, V.; Perez-Meana, H.; Olivares-Mercado, J.; Sanchez, V. Social Sentiment Sensor in Twitter for Predicting Cyber-Attacks Using ℓ1 Regularization. Sensors 2018, 18, 1380. [Google Scholar] [CrossRef] [Green Version]
  18. Sufi, F. Algorithms in Low-Code-No-Code for Research Applications: A Practical Review. Algorithms 2023, 16, 108. [Google Scholar] [CrossRef]
  19. Pattnaik, N.; Li, S.; Nurse, J.R. Perspectives of non-expert users on cyber security and privacy: An analysis of online discussions on twitter. Comput. Secur. 2023, 125, 103008. [Google Scholar] [CrossRef]
  20. Geetha, R.; Karthika, S. Sensitive Keyword Extraction Based on Cyber Keywords and LDA in Twitter to Avoid Regrets. In Computational Intelligence in Data Science. ICCIDS 2020. IFIP Advances in Information and Communication Technology; Springer: Berlin/Heidelberg, Germany, 2020; Volume 578. [Google Scholar]
  21. Shah, R.; Aparajit, S.; Chopdekar, R.; Patil, R. Machine Learning based Approach for Detection of Cyberbullying Tweets. Int. J. Comput. Appl. 2020, 175, 51–56. [Google Scholar] [CrossRef]
  22. Rawat, R.; Mahor, V.; Chirgaiya, S.; Shaw, R.N.; Ghosh, A. Analysis of Darknet Traffic for Criminal Activities Detection Using TF-IDF and Light Gradient Boosted Machine Learning Algorithm. Lect. Notes Electr. Eng. Book Ser. LNEE 2021, 756, 671–681. [Google Scholar]
  23. Lanier, H.D.; Diaz, M.I.; Saleh, S.N.; Lehmann, C.U.; Medford, R.J. Analyzing COVID-19 disinformation on Twitter using the hashtags #scamdemic and #plandemic: Retrospective study. PLoS ONE 2022, 17, e0268409. [Google Scholar]
  24. Li, Y.; Liu, Q. A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments. Energy Rep. 2021, 7, 8176–8186. [Google Scholar] [CrossRef]
  25. Correia, V.J. An Explorative Study into the Importance of Defining and Classifying Cyber Terrorism in the United Kingdom. SN Comput. Sci. 2021, 3, 1–31. [Google Scholar]
  26. Agrafiotis, I.; Nurse, J.R.C.; Goldsmith, M.; Creese, S.; Upton, D. A taxonomy of cyber-harms: Defining the impacts of cyber-attacks and understanding how they propagate. J. Cybersecur. 2018, 4, 1–15. [Google Scholar] [CrossRef] [Green Version]
  27. Humayun, M.; Niazi, M.; Jhanjhi, N.; Alshayeb, M.; Mahmood, S. Cyber Security Threats and Vulnerabilities: A Systematic Mapping Study. Arab. J. Sci. Eng. 2020, 45, 3171–3189. [Google Scholar] [CrossRef]
  28. Bhaskar, R. Better Cybersecurity Awareness through Research. 2022. Available online: https://www.isaca.org/resources/isaca-journal/issues/2022/volume-3/better-cybersecurity-awareness-through-research (accessed on 1 April 2023).
  29. Alkhalil, Z.; Hewage, C.; Nawaf, L.; Khan, I. Phishing Attacks: A Recent Comprehensive Study and a New Anatomy. Front. Comput. Sci. 2021, 3, 563060. [Google Scholar] [CrossRef]
  30. Hagen, R.A. Unraveling the Complexity of Cyber Security Threats: A Multidimensional Approach. 2023. Available online: https://www.linkedin.com/pulse/unraveling-complexity-cyber-security-threats-approach-hagen/ (accessed on 25 April 2023).
  31. Analysis of Tweets Related to Cyberbullying: Exploring Information Diffusion and Advice Available for Cyberbullying Victims. Int. J. Cyber Behav. Psychol. Learn. 2015, 5, 31–52. [CrossRef] [Green Version]
  32. Microsoft Documentation. Choosing a Natural Language Processing Technology in Azure. 2020. Available online: https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/natural-language-processing (accessed on 23 March 2023).
  33. Sufi, F.; Khalil, I. Automated Disaster Monitoring from Social Media Posts using AI based Location Intelligence and Sentiment Analysis. IEEE Trans. Comput. Soc. Syst. 2022, 1–11, in press. [Google Scholar] [CrossRef]
  34. Sufi, F.K. Automatic identification and explanation of root causes on COVID-19 index anomalies. MethodsX 2023, 10, 101960. [Google Scholar] [CrossRef] [PubMed]
  35. Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q. Time-Series Anomaly Detection Service at Microsoft. In KDD’19, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; ACM: New York, NY, USA, 2019. [Google Scholar]
  36. Zhao, R.; Ouyang, W.; Li, H.; Wang, X. Saliency detection by multi-context deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
  37. Abas, M.N.; Jalil, S.Z.; Aris, S.A.M. Malware Attack Forecasting by Using Exponential Smoothing. Lect. Notes Electr. Eng. Book Ser. LNEE 2022, 842, 819–831. [Google Scholar]
  38. Cybersecurity & Infrastructure Security Agency. Protecting Against Cyber Threats to Managed Service Providers and their Customers. 2022. Available online: https://www.cisa.gov/news-events/cybersecurity-advisories/aa22-131a (accessed on 21 May 2023).
  39. Gurajala, S.; White, J.S.; Hudson, B.; Voter, B.R.; Matthews, J.N. Profile characteristics of fake Twitter accounts. Big Data Soc. 2016, 3, 1–13. [Google Scholar] [CrossRef] [Green Version]
  40. Ajao, O.; Bhowmik, D.; Zargari, S. Fake News Identification on Twitter with Hybrid CNN and RNN Models. In Proceedings of the 9th International Conference on Social Media and Society, Copenhagen, Denmark, 18–20 July 2018. [Google Scholar]
  41. Twitter. About Twitter API. Available online: https://developer.twitter.com/en/docs/twitter-api/getting-started/about-twitter-api (accessed on 1 June 2023).
Figure 1. Schematic Diagram of Multi-Agent Cyber Intelligence System.
Figure 1. Schematic Diagram of Multi-Agent Cyber Intelligence System.
Futureinternet 15 00231 g001
Figure 2. UML sequence diagram of multi-agent communications with user.
Figure 2. UML sequence diagram of multi-agent communications with user.
Futureinternet 15 00231 g002
Figure 3. Flowchart of the activities of social media agent and cognitive service agent.
Figure 3. Flowchart of the activities of social media agent and cognitive service agent.
Futureinternet 15 00231 g003
Figure 4. Flowchart of activities of web media agent.
Figure 4. Flowchart of activities of web media agent.
Futureinternet 15 00231 g004
Figure 5. Deployment structure of the 4 cyber intelligence agents.
Figure 5. Deployment structure of the 4 cyber intelligence agents.
Futureinternet 15 00231 g005
Figure 6. Daily average of tweet sentiments (8 November 2022 recorded the highest level of negative sentiment).
Figure 6. Daily average of tweet sentiments (8 November 2022 recorded the highest level of negative sentiment).
Futureinternet 15 00231 g006
Figure 7. Monthly average of alerts (i.e., negative sentiments) for Russia, China, Australia, Ukraine, India, and Iran.
Figure 7. Monthly average of alerts (i.e., negative sentiments) for Russia, China, Australia, Ukraine, India, and Iran.
Futureinternet 15 00231 g007
Figure 8. Anomaly detection detected cyber anomalies in Russia, China, Australia, Ukraine, India, and Iran.
Figure 8. Anomaly detection detected cyber anomalies in Russia, China, Australia, Ukraine, India, and Iran.
Futureinternet 15 00231 g008
Figure 9. Ranking of number of cyber-related tweets for the selected countries by month (for the 7-month period).
Figure 9. Ranking of number of cyber-related tweets for the selected countries by month (for the 7-month period).
Futureinternet 15 00231 g009
Figure 10. Ranking the most cyber-attacked countries (for the selected countries) using web cyber-attack statistics.
Figure 10. Ranking the most cyber-attacked countries (for the selected countries) using web cyber-attack statistics.
Futureinternet 15 00231 g010
Figure 11. Identifying the threat types by web media agent using web-based cyber-attack statistics.
Figure 11. Identifying the threat types by web media agent using web-based cyber-attack statistics.
Futureinternet 15 00231 g011
Figure 12. Web-media agent app deployed in iPad 9th Generation running iOS 15. (a) Cyber Threat Prediction for Australia. (b) Country wise detailed cyber threat analysis.
Figure 12. Web-media agent app deployed in iPad 9th Generation running iOS 15. (a) Cyber Threat Prediction for Australia. (b) Country wise detailed cyber threat analysis.
Futureinternet 15 00231 g012
Table 1. Features and Semantic Output of Cyber Intelligence Agents.
Table 1. Features and Semantic Output of Cyber Intelligence Agents.
Name of AgentFeatureSemantic Output
Aggregation Agent(1) Interact with User via Web, iOS, and Android Devices
(2) Collaborate with Social Media Agent
(3) Collaborate with Web Media Agent
(4) Generate Multi-Dimensional and Multi-Source Comprehensive Cyber Intelligence on Selected Countries
(1) Country Name:
(2) Threat Level:
(3) Threat Spectrum:
(4) Geopolitical/ Socioeconomic:
(5) Psychological and Societal:
(6) Impacted Target:
(7) National Concern:
(8) Victimization:
Social Media Agent(1) Obtain Social Media Data
(2) Collaborate with Aggregation Agent
(3) Collaborate with Cognitive Service Agent
(4) Generate Term Frequency
(5) Generate Topic Modelling
(6) Deep Learning-Based Anomaly Detection
(1) Country Name:
(2) Word Frequency:
(3) Topics with Word Frequencies:
(4) Sentiments on Time-Series:
(5) Alerts on Time-Series:
(6) Anomalies on Time-Series:
Cognitive Service Agent(1) Collaborate with Social Media Agent
(2) Generate Translation
(3) Generate Sentiment Analysis
(1) Original Language:
(2) Translated Text:
(3) Overall Sentiment:
(4) Sentiment Confidence:
Web Media Agent(1) Obtain Cyber Threat Statistics on Malicious Mail, Ransomware, Exploits, Web Threats, Spam, Local Infection, Network Attacks, On-Demand Scans from Web Data
(2) Collaborate with Aggregation Agent
(3) Generate Multi-Dimensional Threat Spectrum
(4) Deep Learning-Based Anomaly Detection
(5) Threat Prediction
(1) Country Name:
(2) Threat Type:
(3) Country Rank:
(4) Threat Percentage:
(5) Anomalies on Time-Series:
(6) Threat Prediction on Time-Series:
Table 2. Strategic questions answered by multi-dimensional cyber threat intelligence.
Table 2. Strategic questions answered by multi-dimensional cyber threat intelligence.
Strategic Questions AnsweredDimension of Cyber ThreatReference
1. What type of threat?Threat spectrum (e.g., malware, spyware)[24,27,29,30]
2. Who is attacking?
3. Where is the attack coming from?
4. Why is the attack happening?
5. What is the motivation for this attack?
Geopolitical and socioeconomic[30]
6. Who is the target?
7. Who is the victim of the cyber-attack?
Victimization (human vs. system)[24,25]
8. What are the major cyber-related concerns?National priority and concerns[26,28]
9. What is the impact?Impacted target (infrastructure, supply chain, etc.)[24]
10. What is the societal perception?
11. How do cyber-attacks affect society?
12. How much negativity is generated at a psychological level?
Psychological and societal[26]
13. What is the severity level of the threat?
14. What is the intensity of the cyber threat?
Threat level (low, medium, high)[24]
Table 3. Existing research on analysing cyber-related social media posts using NLP algorithms (X denotes Supported).
Table 3. Existing research on analysing cyber-related social media posts using NLP algorithms (X denotes Supported).
ReferenceSentiment AnalysisTranslationLDATF-IDFStemmingN-GramForecastingML Algorithms
[17]X XXXX
Regression
Naïve Bayes Classifier, Support Vector Machines, Maximum Entropy Classifier
[18]XX
[19]X XX X BERT-based, Logistic Regression, SVM, Random Forest, XGBoost
[20] XX X (bi-Gram)
[21] X Support Vector Classifier, Logistic Regression, Naïve Bayes, Random Forest Classier, SGD Classifier
[22] X LightGBM (light gradient boosted machine)
[23]X X
ProposedXXXXXXXCNN (Deep Learning)
Table 4. Algorithms used within the proposed AI-based cyber threat intelligence.
Table 4. Algorithms used within the proposed AI-based cyber threat intelligence.
Process NameAlgorithm UsedAlgorithm TypeAPI UsedReferences
Sentiment AnalysisMicrosoft Text AnalyticsNLPYes[17,19,33]
Translate to EnglishMicrosoft Text AnalyticsNLPYes[33]
Anomaly DetectionCNNDeep LearningNo[32,33]
Topic ModellingLDANLPNo[19,20]
Term FrequencyTF-IDFNLPNo[17,19,20,21,22]
Term FrequencyPorter StemmingNLPNo[17]
Term FrequencyN-GramNLPNo[17,19,20]
Forecast ThreatExponential SmoothingNLPNo[37]
Table 5. Distribution of 238,220 cyber threat data by attack type.
Table 5. Distribution of 238,220 cyber threat data by attack type.
Attack TypeExploitLocal InfectionMalicious MailNetwork AttackOn-Demand ScanRansomwareSpamWeb Threat
Number of Records29,01732,59230,16530,52232,58423,29927,45032,591
Table 6. Processing of Tweets for AI-based Cyber-Threat Intelligence.
Table 6. Processing of Tweets for AI-based Cyber-Threat Intelligence.
TimeNo. of TwittersNo. of UsersNo. of LocationsNo. of LanguagesTotal RetweetsAvg. Confidence of − Ve Seti.Avg. Confidence of Neut. Seti.Avg. Confidence of + Ve Seti.No. of Translations
October 2022395435561588383,727,7560.360.430.21941
November 2022647058752358389,981,8560.340.430.231283
December 2022651255442225427,565,9460.350.420.231533
January 2023668557852364407,802,3010.360.400.241419
February 2023597650532114434,276,4790.370.420.211373
March 2023663457492357414,799,5400.360.430.211469
April 20231155108353827713,0830.400.410.20258
Total37,38630,70610,1785438,866,9610.360.420.228199
Table 7. Result of Porter stemming for extracting 18 top-most frequent words of cyber tweets on China, Russia, Ukraine, India, and Australia.
Table 7. Result of Porter stemming for extracting 18 top-most frequent words of cyber tweets on China, Russia, Ukraine, India, and Australia.
ChinaRussiaUkraineIndiaAustralia
1chinarussianukraincyberaustralian
2cyberrussiacyberindiacyber
3httpcyberhttphttpaustralia
4hackhackhackindianhttp
5russiaattackrussiahacksecur
6attackhttprussiansecurhack
7chinestrumpukrainiancrimepolic
8hackerusattackattackdata
9stateputinmilitariaccountreport
10countristorimakeawarattack
11securtimothydsnydsecurcybersecurcommun
12backdoorukrainyeargoverncybersecur
13nationheardcountripoliccare
14companisourchelppleasmedia
15accessafterwarddefensyoutubmillion
16adminaprilforcchinazealand
17cybersecurbrokeinvasbankaccus
18databasintimreportcompanicustom
Table 8. Performance of LDA algorithms for analysing topics on cyber-related tweets for China, Russia, Ukraine, India, and Australia.
Table 8. Performance of LDA algorithms for analysing topics on cyber-related tweets for China, Russia, Ukraine, India, and Australia.
Performance VectorsChinaRussiaUkraineIndiaAustralia
LogLikelihood−15,617.27−57,933.967−23,251.897−27,119.332−9514.318
Perplexity517.155458.3841016.203759.998322.952
Avg(tokens)316.5711165.143392519.857206.143
Avg(document_entropy)2.8684.4954.3643.4182.589
Avg(word-length)5.8576.1437.2295.87.286
Avg(coherence)−15.623−13.754−14.672−17.145−13.013
Avg(uniform_dist)2.1012.6772.0092.0782.077
Avg(corpus_dist)1.671.6141.9251.7011.71
Avg(eff_num_words)103.84998.33179.378169.71687.975
Avg(token-doc-diff)0.0050.0010.0070.0030.008
Avg(rank_1_docs)0.8350.7720.1740.8360.886
Avg(allocation_count)0.8760.850.160.8640.901
Avg(exclusivity)0.5040.5970.4610.4380.493
AlphaSum0.0910.1188.4340.10.058
Beta0.2850.1270.6420.2720.26
BetaSum378.828386.221039.923562.278226.947
Table 9. Result of LDA algorithms for extracting 7 topics for China, Russia, Ukraine, India, and Australia.
Table 9. Result of LDA algorithms for extracting 7 topics for China, Russia, Ukraine, India, and Australia.
TOPIC 1TOPIC 2TOPIC 3TOPIC 4TOPIC 5TOPIC 6TOPIC 7
Chinacyber29China20China16Russia6China21China15China17
China22Cyber9Hack5China6hack12cyber7Chinese10
attacks14hack6country4North4chains4war6sophisticated8
Russia8TikTok4national3Cyber4supply4would6databases8
States7China’s4IMMEDIATELY3reports3etc4Russia5Tech8
RussiaRussian72cyber65Russian65hack70Russia60Russia97Russian60
cyber65Russian44hack25Russia36Invades50hacked19Putin59
attack49Ukraine24ShellenbergerMD14Russian30Cyber42cyber18using58
blame27McGonigal22hacking14Russians27attacks33helped16Trump57
threat26FBI19amp13DNC16DarthPutinKGB26new16story57
UkraineState3TheStudyofWar2says3role3country5Ukraine117Leaks2
absolutely2FBI2GicAriana2OMC_Ukraine2loser3cyber76cyberwarfare2
Threat2air2need2Anonymous_Link2brigade2Russian31cyberattacks2
report2infrastructure2don’t2Council2hacker2Ukrainian28Red2
Cross2one2Security2UkraineRussiaWar2awareness2hack28never2
Indiahack9YouTube11Cyber55India19India29cyber17Cyber10
account9YouTubeIndia7Indian24cyber13cyber10India14India9
India8hack5cyber19Cyber11company8crime10amp8
IndiaFreeFire5Cyber5India18Indian9BJP6PMOIndia7Leaks3
please5YouTubeCreators4Crime13China7Hack5Cyber7BSF3
AustraliaAustralians10Australian9Australia7cyber4cyber12amp7Police16
Australian9hack7way4POTUS3Australia11Australia7Australian14
scamming6Medibank6Cyber3Australia3data8Cyber5Cyber12
Boys6million5Australian3INSTAGRAM2Australian7Leaks3Australia10
Yahoo6health5fundamental2AustralianOpen2attack5https://t.co3love7
Table 10. Evaluation of the agents with “fast” dimension.
Table 10. Evaluation of the agents with “fast” dimension.
Number of AgentsConfiguration of AgentAverage Response Time (Seconds)
OneA single agent processing both tweets and web-based cyber-attack statistics9.032
TwoOne agent processing tweets and another agent processing web-based cyber-attack statistics8.908
ThreeOne agent performing aggregation, another one processing tweets, and the last agent processing web-based cyber-attack statistics7.781
FourOne agent performing aggregation, two agents processing tweets, and the last agent processing web-based cyber-attack statistics (Proposed)6.451
FiveOne agent performing aggregation, two agents processing tweets, and the other two agents processing web-based cyber-attack statistics7.812
Table 11. Evaluation of the agents with “comprehensive” dimension.
Table 11. Evaluation of the agents with “comprehensive” dimension.
Country NameThreat LevelThreat SpectrumGeopoliticalPsychologicalImpacted TargetNational ConcernVictimization
ChinaDeep RedSpam, Network AttackUS, RussiaModerateTikTok, DatabaseEspionage, National SecuritySupply Chain Tech Firms
RussiaRedSpamUS, RussiaHighPutin, KGBFBI, TrumpPutin, KGB, Russian Government
UkraineDeep AmberLocal
Infection
RussiaHighUkrainian SecurityUkraine Russia WarInfrastructure
IndiaAmberSpamChinaLowYouTube, BJPYouTube Hack, Account HackIndividual
Accounts
IranYellowOn-Demand ScanUSModerate
AustraliaGreenWeb ThreatChinaModerateHealth (Medibank) Electricity NetworkData Breach, Malware, Phishing, RansomwareAustralian,
Infrastructure
Table 12. Confusion matrix in identifying relevancy of the cyber-related tweets.
Table 12. Confusion matrix in identifying relevancy of the cyber-related tweets.
Actual PositiveActual Negative
Predicted Positive23,178 (TP)2241 (FP)
Predicted Negative4149 (FN)7818 (TN)
Table 13. Performance evaluations in identifying relevant cyber-related tweets.
Table 13. Performance evaluations in identifying relevant cyber-related tweets.
Evaluation MetricFormulaCalculation
PrecisionPPV = TP/(TP + FP)0.9118
RecallTPR = TP/(TP + FN)0.8482
SensitivityTPR = TP/(TP + FN)0.8482
SpecificitySPC = TN/(FP + TN)0.7772
Negative Predictive ValueNPV = TN/(TN + FN)0.6533
False Positive RateFPR = FP/(FP + TN)0.2228
False Discovery RateFDR = FP/(FP + TP)0.0882
False Negative RateFNR = FN/(FN + TP)0.1518
AccuracyACC = (TP + TN)/(TP + FP + TN + FN)0.8291
F1-ScoreF1 = 2TP/(2TP + FP + FN)0.8789
Table 14. Performance comparison with existing research in social media-based cyber intelligence.
Table 14. Performance comparison with existing research in social media-based cyber intelligence.
Algorithms UsedPrecisionRecallF1-ScoreReference
Naïve Bayes (Negative)0.770.800.79[17]
Naïve Bayes (Positive)0.760.760.76[17]
Naïve Bayes (Security-Oriented)0.940.910.93[17]
Support Vector Machine (Negative)0.800.800.80[17]
Support Vector Machine (Positive)0.780.800.79[17]
Support Vector Machine (Security-Oriented)0.950.940.95[17]
Maximum Entropy (Negative)0.810.800.80[17]
Maximum Entropy (Positive)0.780.800.79[17]
Maximum Entropy (Security-Oriented)0.960.940.95[17]
Random Forest (CySecPriv)0.940.610.74[19]
Random Forest (‘NonExpertUser)0.701.00.83[19]
LDA—VEM + TF-IDF (Personal)--0.76[20]
LDA—VEM + TF-IDF (Professional)--0.67[20]
LDA—VEM + TF-IDF (Health)--0.75[20]
SVC (Cyber Bullying)0.730.960.83[21]
Logistic Regression (Cyber Bullying)0.910.960.93[21]
Multinomial Naïve Bayes (Cyber Bullying)0.860.940.90[21]
Random Forest Classifier (Cyber Bullying)0.980.730.84[21]
SGD Classifier (Cyber Bullying)0.900.950.93[21]
Light Gradient Boosted Machine (Darknet Traffic)--0.84[22]
Proposed (Comprehensive Cyber)0.910.850.88
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sufi, F. A New AI-Based Semantic Cyber Intelligence Agent. Future Internet 2023, 15, 231. https://doi.org/10.3390/fi15070231

AMA Style

Sufi F. A New AI-Based Semantic Cyber Intelligence Agent. Future Internet. 2023; 15(7):231. https://doi.org/10.3390/fi15070231

Chicago/Turabian Style

Sufi, Fahim. 2023. "A New AI-Based Semantic Cyber Intelligence Agent" Future Internet 15, no. 7: 231. https://doi.org/10.3390/fi15070231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop