A New AI-Based Semantic Cyber Intelligence Agent
Abstract
:1. Introduction
2. Background and Literature Review
2.1. Contextual Information on Multi-Dimensional Cyber Intelligence
2.2. NLP-Based Cyber Intelligence from Social Media
3. Materials and Methods
3.1. Language Detection and Translation Process
Algorithm 1: Language Processing on Cyber-Related Social Media Messages | |||
1: | For each xi in N, Multilingual Social Media Messages | ||
2: | If Language(xi)<> ‘English’ | ||
3: | yi = Translate(xi) | ||
4: | Else | ||
5: | yi = xi | ||
6: | For each yi in N, English Social Media Messages | ||
7: | si = Sentiment(yi) | ||
8: | If yi Contains ‘Country Name’ | ||
9: | {, yi,}= yi | ||
10: | For each cr in C, Countries | ||
11: | {Yes/No,} = AnomalyDetection(CountofMessagesonTimeUnit(),) | ||
12: | {{,}, …} = TermFrequency(Tokenize(yi)) | ||
13: | {{,}, …} = Stemming(Tokenize(yi)) | ||
14: | {{,}, …} = n_gram(Tokenize(yi)) | ||
15: | {{, {{}…}}, …} = Topic(Tokenize(yi)) | ||
16: | Generate Interactive Visualization |
3.2. Sentiment Analysis Process
3.3. Anomaly Detection Process
3.4. Term Frequency Generation Process
3.5. Topic Generation Process
3.6. Threat Prediction Process
4. Results
- Daily ransomware data from https://statistics.securelist.com/ransomware/day (accessed on 3 March 2023)
- Daily vulnerability data from https://statistics.securelist.com/vulnerability-scan/day (accessed on 3 March 2023)
- Daily web threat data from https://statistics.securelist.com/web-anti-virus/day (accessed on 3 March 2023)
- Daily spam data from https://statistics.securelist.com/kaspersky-anti-spam/day (accessed on 3 March 2023)
- Daily malicious mail data from https://statistics.securelist.com/mail-anti-virus/day (accessed on 3 March 2023)
- Daily network attack data from https://statistics.securelist.com/intrusion-detection-scan/day (accessed on 3 March 2023)
- Daily local infection data from https://statistics.securelist.com/on-access-scan/day (accessed on 3 March 2023)
- Daily on-demand-scan data from https://statistics.securelist.com/on-demand-scan/day (accessed on 3 March 2023)
5. Discussion and Concluding Remarks
- Firstly, the proposed approach assumed that all 37,386 cyber-related tweets were relevant. However, it is evident from the data presented in Table 12 that not all 37,386 tweets could be classified as cyber-related. Employing the confusion matrix depicted in Table 12, an array of performance evaluation criteria encompassing precision, recall, sensitivity, specificity, F1-score, accuracy, and others were computed and documented in Table 13. Upon comparing the performance of the proposed approach with existing research in the realm of social media-based cyber intelligence, it becomes apparent, as indicated in Table 14, that a few extant studies, specifically [17] and [21], outperformed the proposed approach in certain instances. Nonetheless, it is worth noting that the proposed approach exhibits superior performance compared to the majority of existing solutions documented in the literature. On average, the F1-score achieved by the prevailing methodologies was observed to be 0.83, whereas the proposed solution showcased a significantly higher F1-score of 0.88.
- Thirdly, this study relies on real-time tweet API, Microsoft Power Platform, and Microsoft Azure, all of which necessitate regular payment through credit cards. For instance, access to the basic Twitter API with a monthly limit of reading only 10K Tweets incurs a cost of $100 USD per month [41]. Increasing this limit to read 1 million tweets could result in a financial commitment of $5000 USD per month [41]. Consequently, in order to minimize expenses, this research examined only a limited number of tweets. Researchers interested in working with real-time tweets must possess access to credit cards and sufficient research funds to sustain the ongoing subscription costs.
- Fourthly, this research extensively employed “black box” cloud-based services and tools, such as Microsoft Cognitive Services, which poses substantial challenges in investigating algorithmic biases and potential enhancements.
- Lastly, this investigation employed industry standard tools and cutting-edge cloud services, including Microsoft Power Platform and Microsoft Azure. Therefore, conducting this research necessitates expertise and certifications in these technologies and standards.
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix A.1. Language Detection and Translation
Appendix A.1.1. Python Code Sample
Appendix A.1.2. Sample Output
Appendix A.2. Sentiment Analysis
Appendix A.2.1. Python Code Sample
Appendix A.2.2. Sample Output
Appendix A.3. Anomaly Detection
Appendix A.3.1. Python Code
Appendix A.3.2. Sample Output
References
- Cremer, F.; Sheehan, B.; Fortmann, M.; Kia, A.N.; Mullins, M.; Murphy, F.; Materne, S. Cyber risk and cybersecurity: A systematic review of data availability. Geneva Pap. Risk Insur. Issues Pract. 2022, 47, 698–736. [Google Scholar] [CrossRef] [PubMed]
- Cybercrime Magazine. Cybercrime to Cost The World $10.5 Trillion Annually by 2025. 2020. Available online: https://cybersecurityventures.com/hackerpocalypse-cybercrime-report-2016/ (accessed on 15 October 2022).
- Statista Research Department. Consumer Loss through Cyber Crime Worldwide in 2017, by Victim Country. 2022. Available online: https://www.statista.com/statistics/799875/countries-with-the-largest-losses-through-cybercrime/ (accessed on 26 October 2022).
- Bada, M.; Nurse, J.R. Chapter 4—The social and psychological impact of cyberattacks. In Emerging Cyber Threats and Cognitive Vulnerabilities; Benson, V., Mcalaney, J., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 73–92. [Google Scholar]
- BBC. “News: Optus: How a Massive Data Breach Has Exposed Australia”. 2022. Available online: https://www.bbc.com/news/world-australia-63056838 (accessed on 16 October 2022).
- Australian Securities & Investments Commissions. Guidance for Consumers Impacted by the Optus Data Breach. 2022. Available online: https://asic.gov.au/about-asic/news-centre/news-items/guidance-for-consumers-impacted-by-the-optus-data-breach/ (accessed on 19 October 2022).
- Merritt, K. OPTUS Confirms 2.1 Million Customers Affected by Cyberattack, Total Telecom. 2022. Available online: https://totaltele.com/optus-confirms-2-1-million-customers-affected-by-cyberattack/ (accessed on 23 October 2022).
- Kaye, B. Australia’s No. 1 Health Insurer Says Hacker Stole Patient Details, Reuters. 2022. Available online: https://www.reuters.com/technology/after-telco-hack-australia-faces-wave-data-breaches-2022-10-20/ (accessed on 25 October 2022).
- Zibak, A.; Simpson, A. Cyber Threat Information Sharing: Perceived Benefits and Barriers. In ARES’19, Proceedings of the 14th International Conference on Availability, Reliability and Security, Canterbury, UK, 26–29 August 2019; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar]
- Xu, S.; Qian, Y.; Hu, R.Q. Data-Driven Network Intelligence for Anomaly Detection. IEEE Netw. 2019, 33, 88–95. [Google Scholar] [CrossRef]
- Keshk, M.; Sitnikova, E.; Moustafa, N.; Hu, J.; Khalil, I. An Integrated Framework for Privacy-Preserving Based Anomaly Detection for Cyber-Physical Systems. IEEE Trans. Sustain. Comput. 2021, 6, 66–79. [Google Scholar] [CrossRef]
- Ten, C.-W.; Hong, J.; Liu, C.-C. Anomaly Detection for Cybersecurity of the Substations. IEEE Trans. Smart Grid 2011, 2, 865–873. [Google Scholar] [CrossRef]
- Yang, J.; Zhou, C.; Yang, S.; Xu, H.; Hu, B. Anomaly Detection Based on Zone Partition for Security Protection of Industrial Cyber-Physical Systems. IEEE Trans. Ind. Electron. 2017, 65, 4257–4267. [Google Scholar] [CrossRef] [Green Version]
- Shi, D.; Guo, Z.; Johansson, K.H.; Shi, L. Causality Countermeasures for Anomaly Detection in Cyber-Physical Systems. IEEE Trans. Autom. Control. 2018, 63, 386–401. [Google Scholar] [CrossRef]
- Khan, N.F.; Ikram, N.; Saleem, S.; Zafar, S. Cyber-security and risky behaviors in a developing country context: A Pakistani perspective. Secur. J. 2022, 1–33. Available online: https://link.springer.com/content/pdf/10.1057/s41284-022-00343-4.pdf (accessed on 23 March 2023).
- Sufi, F. A New Social Media-Driven Cyber Threat Intelligence. Electronics 2023, 12, 1242. [Google Scholar] [CrossRef]
- Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, K.; Martinez-Hernandez, V.; Perez-Meana, H.; Olivares-Mercado, J.; Sanchez, V. Social Sentiment Sensor in Twitter for Predicting Cyber-Attacks Using ℓ1 Regularization. Sensors 2018, 18, 1380. [Google Scholar] [CrossRef] [Green Version]
- Sufi, F. Algorithms in Low-Code-No-Code for Research Applications: A Practical Review. Algorithms 2023, 16, 108. [Google Scholar] [CrossRef]
- Pattnaik, N.; Li, S.; Nurse, J.R. Perspectives of non-expert users on cyber security and privacy: An analysis of online discussions on twitter. Comput. Secur. 2023, 125, 103008. [Google Scholar] [CrossRef]
- Geetha, R.; Karthika, S. Sensitive Keyword Extraction Based on Cyber Keywords and LDA in Twitter to Avoid Regrets. In Computational Intelligence in Data Science. ICCIDS 2020. IFIP Advances in Information and Communication Technology; Springer: Berlin/Heidelberg, Germany, 2020; Volume 578. [Google Scholar]
- Shah, R.; Aparajit, S.; Chopdekar, R.; Patil, R. Machine Learning based Approach for Detection of Cyberbullying Tweets. Int. J. Comput. Appl. 2020, 175, 51–56. [Google Scholar] [CrossRef]
- Rawat, R.; Mahor, V.; Chirgaiya, S.; Shaw, R.N.; Ghosh, A. Analysis of Darknet Traffic for Criminal Activities Detection Using TF-IDF and Light Gradient Boosted Machine Learning Algorithm. Lect. Notes Electr. Eng. Book Ser. LNEE 2021, 756, 671–681. [Google Scholar]
- Lanier, H.D.; Diaz, M.I.; Saleh, S.N.; Lehmann, C.U.; Medford, R.J. Analyzing COVID-19 disinformation on Twitter using the hashtags #scamdemic and #plandemic: Retrospective study. PLoS ONE 2022, 17, e0268409. [Google Scholar]
- Li, Y.; Liu, Q. A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments. Energy Rep. 2021, 7, 8176–8186. [Google Scholar] [CrossRef]
- Correia, V.J. An Explorative Study into the Importance of Defining and Classifying Cyber Terrorism in the United Kingdom. SN Comput. Sci. 2021, 3, 1–31. [Google Scholar]
- Agrafiotis, I.; Nurse, J.R.C.; Goldsmith, M.; Creese, S.; Upton, D. A taxonomy of cyber-harms: Defining the impacts of cyber-attacks and understanding how they propagate. J. Cybersecur. 2018, 4, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Humayun, M.; Niazi, M.; Jhanjhi, N.; Alshayeb, M.; Mahmood, S. Cyber Security Threats and Vulnerabilities: A Systematic Mapping Study. Arab. J. Sci. Eng. 2020, 45, 3171–3189. [Google Scholar] [CrossRef]
- Bhaskar, R. Better Cybersecurity Awareness through Research. 2022. Available online: https://www.isaca.org/resources/isaca-journal/issues/2022/volume-3/better-cybersecurity-awareness-through-research (accessed on 1 April 2023).
- Alkhalil, Z.; Hewage, C.; Nawaf, L.; Khan, I. Phishing Attacks: A Recent Comprehensive Study and a New Anatomy. Front. Comput. Sci. 2021, 3, 563060. [Google Scholar] [CrossRef]
- Hagen, R.A. Unraveling the Complexity of Cyber Security Threats: A Multidimensional Approach. 2023. Available online: https://www.linkedin.com/pulse/unraveling-complexity-cyber-security-threats-approach-hagen/ (accessed on 25 April 2023).
- Analysis of Tweets Related to Cyberbullying: Exploring Information Diffusion and Advice Available for Cyberbullying Victims. Int. J. Cyber Behav. Psychol. Learn. 2015, 5, 31–52. [CrossRef] [Green Version]
- Microsoft Documentation. Choosing a Natural Language Processing Technology in Azure. 2020. Available online: https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/natural-language-processing (accessed on 23 March 2023).
- Sufi, F.; Khalil, I. Automated Disaster Monitoring from Social Media Posts using AI based Location Intelligence and Sentiment Analysis. IEEE Trans. Comput. Soc. Syst. 2022, 1–11, in press. [Google Scholar] [CrossRef]
- Sufi, F.K. Automatic identification and explanation of root causes on COVID-19 index anomalies. MethodsX 2023, 10, 101960. [Google Scholar] [CrossRef] [PubMed]
- Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q. Time-Series Anomaly Detection Service at Microsoft. In KDD’19, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; ACM: New York, NY, USA, 2019. [Google Scholar]
- Zhao, R.; Ouyang, W.; Li, H.; Wang, X. Saliency detection by multi-context deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Abas, M.N.; Jalil, S.Z.; Aris, S.A.M. Malware Attack Forecasting by Using Exponential Smoothing. Lect. Notes Electr. Eng. Book Ser. LNEE 2022, 842, 819–831. [Google Scholar]
- Cybersecurity & Infrastructure Security Agency. Protecting Against Cyber Threats to Managed Service Providers and their Customers. 2022. Available online: https://www.cisa.gov/news-events/cybersecurity-advisories/aa22-131a (accessed on 21 May 2023).
- Gurajala, S.; White, J.S.; Hudson, B.; Voter, B.R.; Matthews, J.N. Profile characteristics of fake Twitter accounts. Big Data Soc. 2016, 3, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Ajao, O.; Bhowmik, D.; Zargari, S. Fake News Identification on Twitter with Hybrid CNN and RNN Models. In Proceedings of the 9th International Conference on Social Media and Society, Copenhagen, Denmark, 18–20 July 2018. [Google Scholar]
- Twitter. About Twitter API. Available online: https://developer.twitter.com/en/docs/twitter-api/getting-started/about-twitter-api (accessed on 1 June 2023).
Name of Agent | Feature | Semantic Output |
---|---|---|
Aggregation Agent | (1) Interact with User via Web, iOS, and Android Devices (2) Collaborate with Social Media Agent (3) Collaborate with Web Media Agent (4) Generate Multi-Dimensional and Multi-Source Comprehensive Cyber Intelligence on Selected Countries | (1) Country Name: (2) Threat Level: (3) Threat Spectrum: (4) Geopolitical/ Socioeconomic: (5) Psychological and Societal: (6) Impacted Target: (7) National Concern: (8) Victimization: |
Social Media Agent | (1) Obtain Social Media Data (2) Collaborate with Aggregation Agent (3) Collaborate with Cognitive Service Agent (4) Generate Term Frequency (5) Generate Topic Modelling (6) Deep Learning-Based Anomaly Detection | (1) Country Name: (2) Word Frequency: (3) Topics with Word Frequencies: (4) Sentiments on Time-Series: (5) Alerts on Time-Series: (6) Anomalies on Time-Series: |
Cognitive Service Agent | (1) Collaborate with Social Media Agent (2) Generate Translation (3) Generate Sentiment Analysis | (1) Original Language: (2) Translated Text: (3) Overall Sentiment: (4) Sentiment Confidence: |
Web Media Agent | (1) Obtain Cyber Threat Statistics on Malicious Mail, Ransomware, Exploits, Web Threats, Spam, Local Infection, Network Attacks, On-Demand Scans from Web Data (2) Collaborate with Aggregation Agent (3) Generate Multi-Dimensional Threat Spectrum (4) Deep Learning-Based Anomaly Detection (5) Threat Prediction | (1) Country Name: (2) Threat Type: (3) Country Rank: (4) Threat Percentage: (5) Anomalies on Time-Series: (6) Threat Prediction on Time-Series: |
Strategic Questions Answered | Dimension of Cyber Threat | Reference |
---|---|---|
1. What type of threat? | Threat spectrum (e.g., malware, spyware) | [24,27,29,30] |
2. Who is attacking? 3. Where is the attack coming from? 4. Why is the attack happening? 5. What is the motivation for this attack? | Geopolitical and socioeconomic | [30] |
6. Who is the target? 7. Who is the victim of the cyber-attack? | Victimization (human vs. system) | [24,25] |
8. What are the major cyber-related concerns? | National priority and concerns | [26,28] |
9. What is the impact? | Impacted target (infrastructure, supply chain, etc.) | [24] |
10. What is the societal perception? 11. How do cyber-attacks affect society? 12. How much negativity is generated at a psychological level? | Psychological and societal | [26] |
13. What is the severity level of the threat? 14. What is the intensity of the cyber threat? | Threat level (low, medium, high) | [24] |
Reference | Sentiment Analysis | Translation | LDA | TF-IDF | Stemming | N-Gram | Forecasting | ML Algorithms |
---|---|---|---|---|---|---|---|---|
[17] | X | X | X | X | X Regression | Naïve Bayes Classifier, Support Vector Machines, Maximum Entropy Classifier | ||
[18] | X | X | ||||||
[19] | X | X | X | X | BERT-based, Logistic Regression, SVM, Random Forest, XGBoost | |||
[20] | X | X | X (bi-Gram) | |||||
[21] | X | Support Vector Classifier, Logistic Regression, Naïve Bayes, Random Forest Classier, SGD Classifier | ||||||
[22] | X | LightGBM (light gradient boosted machine) | ||||||
[23] | X | X | ||||||
Proposed | X | X | X | X | X | X | X | CNN (Deep Learning) |
Process Name | Algorithm Used | Algorithm Type | API Used | References |
---|---|---|---|---|
Sentiment Analysis | Microsoft Text Analytics | NLP | Yes | [17,19,33] |
Translate to English | Microsoft Text Analytics | NLP | Yes | [33] |
Anomaly Detection | CNN | Deep Learning | No | [32,33] |
Topic Modelling | LDA | NLP | No | [19,20] |
Term Frequency | TF-IDF | NLP | No | [17,19,20,21,22] |
Term Frequency | Porter Stemming | NLP | No | [17] |
Term Frequency | N-Gram | NLP | No | [17,19,20] |
Forecast Threat | Exponential Smoothing | NLP | No | [37] |
Attack Type | Exploit | Local Infection | Malicious Mail | Network Attack | On-Demand Scan | Ransomware | Spam | Web Threat |
---|---|---|---|---|---|---|---|---|
Number of Records | 29,017 | 32,592 | 30,165 | 30,522 | 32,584 | 23,299 | 27,450 | 32,591 |
Time | No. of Twitters | No. of Users | No. of Locations | No. of Languages | Total Retweets | Avg. Confidence of − Ve Seti. | Avg. Confidence of Neut. Seti. | Avg. Confidence of + Ve Seti. | No. of Translations |
---|---|---|---|---|---|---|---|---|---|
October 2022 | 3954 | 3556 | 1588 | 38 | 3,727,756 | 0.36 | 0.43 | 0.21 | 941 |
November 2022 | 6470 | 5875 | 2358 | 38 | 9,981,856 | 0.34 | 0.43 | 0.23 | 1283 |
December 2022 | 6512 | 5544 | 2225 | 42 | 7,565,946 | 0.35 | 0.42 | 0.23 | 1533 |
January 2023 | 6685 | 5785 | 2364 | 40 | 7,802,301 | 0.36 | 0.40 | 0.24 | 1419 |
February 2023 | 5976 | 5053 | 2114 | 43 | 4,276,479 | 0.37 | 0.42 | 0.21 | 1373 |
March 2023 | 6634 | 5749 | 2357 | 41 | 4,799,540 | 0.36 | 0.43 | 0.21 | 1469 |
April 2023 | 1155 | 1083 | 538 | 27 | 713,083 | 0.40 | 0.41 | 0.20 | 258 |
Total | 37,386 | 30,706 | 10,178 | 54 | 38,866,961 | 0.36 | 0.42 | 0.22 | 8199 |
China | Russia | Ukraine | India | Australia | |
---|---|---|---|---|---|
1 | china | russian | ukrain | cyber | australian |
2 | cyber | russia | cyber | india | cyber |
3 | http | cyber | http | http | australia |
4 | hack | hack | hack | indian | http |
5 | russia | attack | russia | hack | secur |
6 | attack | http | russian | secur | hack |
7 | chines | trump | ukrainian | crime | polic |
8 | hacker | us | attack | attack | data |
9 | state | putin | militari | account | report |
10 | countri | stori | make | awar | attack |
11 | secur | timothydsnyd | secur | cybersecur | commun |
12 | backdoor | ukrain | year | govern | cybersecur |
13 | nation | heard | countri | polic | care |
14 | compani | sourc | help | pleas | media |
15 | access | afterward | defens | youtub | million |
16 | admin | april | forc | china | zealand |
17 | cybersecur | broke | invas | bank | accus |
18 | databas | intim | report | compani | custom |
Performance Vectors | China | Russia | Ukraine | India | Australia |
---|---|---|---|---|---|
LogLikelihood | −15,617.27 | −57,933.967 | −23,251.897 | −27,119.332 | −9514.318 |
Perplexity | 517.155 | 458.384 | 1016.203 | 759.998 | 322.952 |
Avg(tokens) | 316.571 | 1165.143 | 392 | 519.857 | 206.143 |
Avg(document_entropy) | 2.868 | 4.495 | 4.364 | 3.418 | 2.589 |
Avg(word-length) | 5.857 | 6.143 | 7.229 | 5.8 | 7.286 |
Avg(coherence) | −15.623 | −13.754 | −14.672 | −17.145 | −13.013 |
Avg(uniform_dist) | 2.101 | 2.677 | 2.009 | 2.078 | 2.077 |
Avg(corpus_dist) | 1.67 | 1.614 | 1.925 | 1.701 | 1.71 |
Avg(eff_num_words) | 103.849 | 98.33 | 179.378 | 169.716 | 87.975 |
Avg(token-doc-diff) | 0.005 | 0.001 | 0.007 | 0.003 | 0.008 |
Avg(rank_1_docs) | 0.835 | 0.772 | 0.174 | 0.836 | 0.886 |
Avg(allocation_count) | 0.876 | 0.85 | 0.16 | 0.864 | 0.901 |
Avg(exclusivity) | 0.504 | 0.597 | 0.461 | 0.438 | 0.493 |
AlphaSum | 0.091 | 0.118 | 8.434 | 0.1 | 0.058 |
Beta | 0.285 | 0.127 | 0.642 | 0.272 | 0.26 |
BetaSum | 378.828 | 386.22 | 1039.923 | 562.278 | 226.947 |
TOPIC 1 | TOPIC 2 | TOPIC 3 | TOPIC 4 | TOPIC 5 | TOPIC 6 | TOPIC 7 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
China | cyber | 29 | China | 20 | China | 16 | Russia | 6 | China | 21 | China | 15 | China | 17 |
China | 22 | Cyber | 9 | Hack | 5 | China | 6 | hack | 12 | cyber | 7 | Chinese | 10 | |
attacks | 14 | hack | 6 | country | 4 | North | 4 | chains | 4 | war | 6 | sophisticated | 8 | |
Russia | 8 | TikTok | 4 | national | 3 | Cyber | 4 | supply | 4 | would | 6 | databases | 8 | |
States | 7 | China’s | 4 | IMMEDIATELY | 3 | reports | 3 | etc | 4 | Russia | 5 | Tech | 8 | |
Russia | Russian | 72 | cyber | 65 | Russian | 65 | hack | 70 | Russia | 60 | Russia | 97 | Russian | 60 |
cyber | 65 | Russian | 44 | hack | 25 | Russia | 36 | Invades | 50 | hacked | 19 | Putin | 59 | |
attack | 49 | Ukraine | 24 | ShellenbergerMD | 14 | Russian | 30 | Cyber | 42 | cyber | 18 | using | 58 | |
blame | 27 | McGonigal | 22 | hacking | 14 | Russians | 27 | attacks | 33 | helped | 16 | Trump | 57 | |
threat | 26 | FBI | 19 | amp | 13 | DNC | 16 | DarthPutinKGB | 26 | new | 16 | story | 57 | |
Ukraine | State | 3 | TheStudyofWar | 2 | says | 3 | role | 3 | country | 5 | Ukraine | 117 | Leaks | 2 |
absolutely | 2 | FBI | 2 | GicAriana | 2 | OMC_Ukraine | 2 | loser | 3 | cyber | 76 | cyberwarfare | 2 | |
Threat | 2 | air | 2 | need | 2 | Anonymous_Link | 2 | brigade | 2 | Russian | 31 | cyberattacks | 2 | |
report | 2 | infrastructure | 2 | don’t | 2 | Council | 2 | hacker | 2 | Ukrainian | 28 | Red | 2 | |
Cross | 2 | one | 2 | Security | 2 | UkraineRussiaWar | 2 | awareness | 2 | hack | 28 | never | 2 | |
India | hack | 9 | YouTube | 11 | Cyber | 55 | India | 19 | India | 29 | cyber | 17 | Cyber | 10 |
account | 9 | YouTubeIndia | 7 | Indian | 24 | cyber | 13 | cyber | 10 | India | 14 | India | 9 | |
India | 8 | hack | 5 | cyber | 19 | Cyber | 11 | company | 8 | crime | 10 | amp | 8 | |
IndiaFreeFire | 5 | Cyber | 5 | India | 18 | Indian | 9 | BJP | 6 | PMOIndia | 7 | Leaks | 3 | |
please | 5 | YouTubeCreators | 4 | Crime | 13 | China | 7 | Hack | 5 | Cyber | 7 | BSF | 3 | |
Australia | Australians | 10 | Australian | 9 | Australia | 7 | cyber | 4 | cyber | 12 | amp | 7 | Police | 16 |
Australian | 9 | hack | 7 | way | 4 | POTUS | 3 | Australia | 11 | Australia | 7 | Australian | 14 | |
scamming | 6 | Medibank | 6 | Cyber | 3 | Australia | 3 | data | 8 | Cyber | 5 | Cyber | 12 | |
Boys | 6 | million | 5 | Australian | 3 | 2 | Australian | 7 | Leaks | 3 | Australia | 10 | ||
Yahoo | 6 | health | 5 | fundamental | 2 | AustralianOpen | 2 | attack | 5 | https://t.co | 3 | love | 7 |
Number of Agents | Configuration of Agent | Average Response Time (Seconds) |
---|---|---|
One | A single agent processing both tweets and web-based cyber-attack statistics | 9.032 |
Two | One agent processing tweets and another agent processing web-based cyber-attack statistics | 8.908 |
Three | One agent performing aggregation, another one processing tweets, and the last agent processing web-based cyber-attack statistics | 7.781 |
Four | One agent performing aggregation, two agents processing tweets, and the last agent processing web-based cyber-attack statistics (Proposed) | 6.451 |
Five | One agent performing aggregation, two agents processing tweets, and the other two agents processing web-based cyber-attack statistics | 7.812 |
Country Name | Threat Level | Threat Spectrum | Geopolitical | Psychological | Impacted Target | National Concern | Victimization |
---|---|---|---|---|---|---|---|
China | Deep Red | Spam, Network Attack | US, Russia | Moderate | TikTok, Database | Espionage, National Security | Supply Chain Tech Firms |
Russia | Red | Spam | US, Russia | High | Putin, KGB | FBI, Trump | Putin, KGB, Russian Government |
Ukraine | Deep Amber | Local Infection | Russia | High | Ukrainian Security | Ukraine Russia War | Infrastructure |
India | Amber | Spam | China | Low | YouTube, BJP | YouTube Hack, Account Hack | Individual Accounts |
Iran | Yellow | On-Demand Scan | US | Moderate | |||
Australia | Green | Web Threat | China | Moderate | Health (Medibank) Electricity Network | Data Breach, Malware, Phishing, Ransomware | Australian, Infrastructure |
Actual Positive | Actual Negative | |
---|---|---|
Predicted Positive | 23,178 (TP) | 2241 (FP) |
Predicted Negative | 4149 (FN) | 7818 (TN) |
Evaluation Metric | Formula | Calculation |
---|---|---|
Precision | PPV = TP/(TP + FP) | 0.9118 |
Recall | TPR = TP/(TP + FN) | 0.8482 |
Sensitivity | TPR = TP/(TP + FN) | 0.8482 |
Specificity | SPC = TN/(FP + TN) | 0.7772 |
Negative Predictive Value | NPV = TN/(TN + FN) | 0.6533 |
False Positive Rate | FPR = FP/(FP + TN) | 0.2228 |
False Discovery Rate | FDR = FP/(FP + TP) | 0.0882 |
False Negative Rate | FNR = FN/(FN + TP) | 0.1518 |
Accuracy | ACC = (TP + TN)/(TP + FP + TN + FN) | 0.8291 |
F1-Score | F1 = 2TP/(2TP + FP + FN) | 0.8789 |
Algorithms Used | Precision | Recall | F1-Score | Reference |
---|---|---|---|---|
Naïve Bayes (Negative) | 0.77 | 0.80 | 0.79 | [17] |
Naïve Bayes (Positive) | 0.76 | 0.76 | 0.76 | [17] |
Naïve Bayes (Security-Oriented) | 0.94 | 0.91 | 0.93 | [17] |
Support Vector Machine (Negative) | 0.80 | 0.80 | 0.80 | [17] |
Support Vector Machine (Positive) | 0.78 | 0.80 | 0.79 | [17] |
Support Vector Machine (Security-Oriented) | 0.95 | 0.94 | 0.95 | [17] |
Maximum Entropy (Negative) | 0.81 | 0.80 | 0.80 | [17] |
Maximum Entropy (Positive) | 0.78 | 0.80 | 0.79 | [17] |
Maximum Entropy (Security-Oriented) | 0.96 | 0.94 | 0.95 | [17] |
Random Forest (CySecPriv) | 0.94 | 0.61 | 0.74 | [19] |
Random Forest (‘NonExpertUser) | 0.70 | 1.0 | 0.83 | [19] |
LDA—VEM + TF-IDF (Personal) | - | - | 0.76 | [20] |
LDA—VEM + TF-IDF (Professional) | - | - | 0.67 | [20] |
LDA—VEM + TF-IDF (Health) | - | - | 0.75 | [20] |
SVC (Cyber Bullying) | 0.73 | 0.96 | 0.83 | [21] |
Logistic Regression (Cyber Bullying) | 0.91 | 0.96 | 0.93 | [21] |
Multinomial Naïve Bayes (Cyber Bullying) | 0.86 | 0.94 | 0.90 | [21] |
Random Forest Classifier (Cyber Bullying) | 0.98 | 0.73 | 0.84 | [21] |
SGD Classifier (Cyber Bullying) | 0.90 | 0.95 | 0.93 | [21] |
Light Gradient Boosted Machine (Darknet Traffic) | - | - | 0.84 | [22] |
Proposed (Comprehensive Cyber) | 0.91 | 0.85 | 0.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sufi, F. A New AI-Based Semantic Cyber Intelligence Agent. Future Internet 2023, 15, 231. https://doi.org/10.3390/fi15070231
Sufi F. A New AI-Based Semantic Cyber Intelligence Agent. Future Internet. 2023; 15(7):231. https://doi.org/10.3390/fi15070231
Chicago/Turabian StyleSufi, Fahim. 2023. "A New AI-Based Semantic Cyber Intelligence Agent" Future Internet 15, no. 7: 231. https://doi.org/10.3390/fi15070231
APA StyleSufi, F. (2023). A New AI-Based Semantic Cyber Intelligence Agent. Future Internet, 15(7), 231. https://doi.org/10.3390/fi15070231