Skip to Content
ElectronicsElectronics
  • Article
  • Open Access

5 July 2024

Automated Conversion of CVE Records into an Expert System, Dedicated to Information Security Risk Analysis, Knowledge-Base Rules

,
,
,
and
Faculty of Fundamental Sciences, Department of Information Systems, Vilnius Gediminas Technical University, Sauletekio al. 11, LT-10223 Vilnius, Lithuania
*
Author to whom correspondence should be addressed.

Abstract

Expert systems (ESs) can be seen as a perspective method for risk analysis process automation, especially in the case of small- and medium-sized enterprises that lack internal security resources. Expert system practical applicability is limited by the fact that the creation of an expert system knowledge base requires a lot of manual work. External knowledge sources, such as attack trees, web pages, and ontologies, are already proven to be valuable sources for the automated creation of knowledge base rules, thus leading to more effective creation of specialized expert systems. This research proposes a new method of automated conversion of CVE data from the National Vulnerability Database (version CVSS 2) into the knowledge base of an expert system and flags CVE records that have higher risk due to already existing exploit tools. This manuscript also contains a description of the method for implementing software and a practical evaluation of conversion results. The uniqueness of the proposed method is incorporation of the records included in the Cybersecurity and Infrastructure Security Agency (CISA) Known Exploited Vulnerabilities Catalog.

1. Introduction

In the rapidly evolving cyber security threat and vulnerability landscape, there is a need for advanced tools and methodologies to effectively analyze and mitigate these risks just to keep up with the pace of such threats. This threat and vulnerability landscape, however, is represented by continuously updated and managed registers that include the name, nature, behavior, and other important features of such threats. Such registers provide datasets that can be used to achieve the required security-related goals.
The Common Vulnerabilities and Exposures (CVE) registry, along with the scores provided by the Common Vulnerability Scoring System (CVSS) managed by the MITRE Corporation and sponsored by the U.S. Department of Homeland Security (DHS) [1] and Cybersecurity and Infrastructure Security Agency (CISA), provides a knowledge base for IT vulnerabilities with input from representatives of a broad range of industry sectors, ranging from the finance sector to academia. It is the U.S. government’s repository of standards-based vulnerability management data [2].
This study proposes an automated method for knowledge base development for expert systems (ESs) dedicated to information security risk analysis by transforming the CVE records into rules that support the knowledge base of a specifically designed expert system. It is achieved through a process for constructing a domain-specific knowledge base, thereby enhancing the precision and efficiency of information security risk analysis.
Moreover, the development of automated methods for knowledge base development in expert systems provides small- and medium-sized businesses with an analysis of potential vulnerabilities, also ensuring that their risk assessment processes are based on the most up-to-date and relevant information available [3]. This approach to risk analysis is essential in proactively identifying and mitigating potential security threats before they escalate into full-fledged cyber incidents, thereby safeguarding critical assets and data from malicious actors.

3. The CVE Data

CVE records are administered by a US non-profit organization, MITRE. They publicly provide access to the CVE database, which can be downloaded freely or directly searched on the website itself. MITRE administers CVE records, assigning them unique numbers and providing the basic information of the registered vulnerability: CVE number, status, description, information source, phase, score, and comments. This way, the reported vulnerability can be linked to other security tools and services. This resource does not provide information such as potential risk, impact, or more detailed technical vulnerability information that would be used to assess the risk or impact of a vulnerability and perform an information security risk analysis.
The National Vulnerability Database (NVD) is another source from which CVE data can be freely downloaded in JSON format or via an API. This source provides CVE data with additional information such as vulnerable software or hardware versions, CVSS Vulnerability Scoring System information, and other useful attributes that can be used to assess the security risks of the software in use, making it useful information for information security purposes and risk analysis. The CVE information provided by NVD is expanded and more detailed; therefore, the data provided by it are selected as a source for the automated knowledge base formation of an expert system for information security risk analysis.

3.1. CVE Basic Data

The CVE data provided by the NVD are presented in the JSON schema. After analyzing the CVE data provided by NVD, the main data that will be used for the automated conversion to the ES knowledge base were selected. Data that are useful for information security risk analysis were selected (see Table 1).
Table 1. Selected basic CVE data.
Key CVE data were analyzed and selected to be used in the automated formation of the ES knowledge base. Among the selected data in Table 1, there are no specified selected vulnerability scoring system (CVSS) data. NVD uses several versions of the CVSS assessment systems. To evaluate and choose which version and which data to use in the formation of the knowledge base, studies of the CVSS versions used by NVD and their data were conducted.

3.2. The CVSS Data

NVD uses three versions of the CVSS scoring system for CVE records: 2.0, 3.0, and 3.1. Comparing the main differences between the versions of CVSS estimation calculation systems found that CVSS versions 3.0 and 3.1 use more criteria to calculate the vulnerability score. The CVSS 3 scoring system is much more comprehensive and accurate than the CVSS version 2 scoring system, making CVSS version 3 more useful for information security risk analysis [30].
After analyzing all CVE records submitted to NVD until 16 June 2021 17:36, it was found that not all CVE records provided by NVD have CVSS estimates. This study analyzed 164,921 records, of which 155,007 records had CVSS estimates. Also, estimates used by CVSS version 3 were found to have the fewest CVE records, and all records with CVSS 3 estimates also have CVSS version 2 estimates.
Although CVSS version 3 uses more criteria to evaluate vulnerabilities and can therefore be more accurate in calculating vulnerability estimates, CVSS version 2 covers almost twice as many records as CVSS version 3. For the ES to make unified decisions for information security risk analysis, it is necessary to choose unified data, based on which the ES will make relevant decisions. Therefore, CVSS version 2 and the data provided by it when forming the ES knowledge base were chosen to be used.

3.3. CISA Known Exploited Vulnerabilities Catalog

CISA has found that vulnerabilities with a low CVSS score can cause just as much damage to information security because a chain of vulnerabilities can be exploited during an attack. Also, a vulnerability may be rated with the highest CVSS score, but its exploitation may be very difficult and unexploitable. Therefore, for these reasons, CISA started developing a catalog of exploits of known vulnerabilities from 3 November 2021.
CISA updates this directory to include additional exploitable vulnerabilities as they become known and when they meet the following conditions:
  • The vulnerability has a CVE ID assigned to it.
  • There is solid evidence that the vulnerability has been actively exploited in the public domain.
  • There is an obvious remedy for the vulnerability, such as a software update from the manufacturer.
Incorporating information from CISA’s Known Exploited Vulnerabilities Catalog into the evolving knowledge base of expert systems for information security risk analysis is beneficial. Such vulnerabilities that are included in this directory have a higher probability of being exploited and harming information security [31].
From the data provided by the CISA Known Exploited Vulnerabilities Catalog, the CVE ID and the date of the vulnerability were chosen to be used. The CVE ID will be used to mark entries that are included in this directory, with the date as additional information about the newness of the inclusion.

4. The Proposed Method

A new method, converting the CVE data from NVD with CVSS version 2.0 to the knowledge base of the expert system and marking those CVE entries that are included in the CISA Known Exploited Vulnerabilities Catalog, is proposed. A diagram of the method is presented in Figure 1.
Figure 1. A method for converting CVE data into the ES knowledge base.
The method steps, as presented in Figure 1, are detailed as follows:
  • Metadata are imported from the NVD. Since NVD provides data metadata (SHA256 hash sums), these data are downloaded and saved to compare with already downloaded data and to avoid re-sending the same data files.
  • Is such file already downloaded? Before starting the CVE data download, the metadata from the previously downloaded CVE data are checked against the newly downloaded metadata from the NVD. If the metadata matches—the CVE data file is not sent, and if the metadata does not match—the process of downloading CVE data is initiated. In this way, data download time is saved.
  • Data file is downloaded and saved. There are two ways to download vulnerability data from NVD: by downloading archives (in GZ or ZIP formats) containing CVE data in JSON format or by using the application programming interface (API). Both methods have advantages and disadvantages. It was decided to use a standard data download from NVD, which sends archives containing files in JSON format. This way, all CVE data are downloaded faster, and local requests are not tracked.
  • Data are extracted and read. In this process, the downloaded archives are extracted, and the CVE data are obtained.
  • Does the CVE entry have a CVSS V2.0 score? This process picks only those CVE records that contain available CVSS version 2.0 scores. If the record does not have it, it is not included in the forming ES knowledge base.
  • Does the CVE specify the software? Only those CVE records that contain software specifications are included.
  • Data from CISA are downloaded and imported. To mark in the conversion process those records that are included in the CISA Known Exploited Vulnerabilities Catalog, this directory is downloaded from the CISA website in CSV format, and the data in it are loaded.
  • CVE data are converted into Jess rule database. This process converts CVE data with CVSS version 2 estimate data into Jess rules. This process also marks those CVE records that are included in the CISA Known Exploited Vulnerabilities Catalog and additionally extracts vulnerable software information from the CPE data contained in CVEs: manufacturer, name, and version.
During the process, only selected CVE data, which are relevant for information security risk analysis, are converted. An example of the selected CVE data conversion is presented in Figure 2, where the data used to build the knowledge base are highlighted in red.
Figure 2. Example of a CVE record.
Jess ES’s knowledge base is made up of a list of facts, known as working memory. In this conversion process, the relevant CVE record data are restructured into the Jess ES fact structure; thus, the fact list is converted into the Jess ES knowledge base. Facts in Jess can be of three types:
  • Unsorted facts—they are like rows in a relational database table, where table columns correspond to named data fields, which are called slots. When writing an unsorted fact, slots can be specified in any order. Unsorted facts are the most used type of facts and a good choice in most situations.
  • Sorted facts—they do not have the structure of named fields, they are just a short, flat list. Such facts are convenient for simple pieces of information that do not require structure.
  • Shadow facts—they are unsorted facts that are associated with Java objects in the real world—they provide the ability to reason about events that are occurring outside Jess ES.
Unsorted facts are general purpose and widely applicable, sorted facts are useful for working with small pieces of information, and shadow facts are used to allow the ES to respond to things happening outside the ES. For converting CVE data to Jess ES facts, the most appropriate fact type is unsorted facts. So, the output of the conversion process looks like the example shown in Figure 3.
Figure 3. Example of a converted fact.
  • Jess expert system knowledge base is being built. This process creates and saves the end result, a file containing the CVE data facts that make up the Jess ES knowledge base.
All facts in Jess ES are created using the deftemplate template, which defines the fields that an input fact can have. For rule-based systems, a deftemplate is like a database schema that defines the way the system views the data it uses. Therefore, before entering converted CVE facts into Jess ES for the first time, it is necessary to map the fields used. The fields used by the converted CVE facts and their description are given in Figure 4.
Figure 4. Description of converted CVE data Jess ES.

4.1. Program Prototype of the Method Converting CVE Data into the ES Knowledge Base

Based on the proposed method, a program prototype that implements the idea of the developed method was created using the Python programming language. It was chosen due to its wide range of use, compatibility with various operating systems, and the fact that it is free.
When the prototype of the created program is launched, all actions of the created method are performed automatically—no user intervention is required. The program prototype initially performs a check of the metadata of locally existing CVE files and the metadata of new CVE files downloaded from NVD. The verification is performed by downloading the metadata from the NVD and comparing it with the already locally existing metadata stored in the nvd_cache.json file created by the application. This file stores metadata file names and SHA256 hash sums of CVE data files, and these data are compared with the newly downloaded metadata to determine whether the CVE data file has already been downloaded or not. If it is determined that such a CVE data file has already been downloaded, the file is not sent again, and if it was not, the file is downloaded. This saves data download time if the CVE data file has not been updated in NVD. The prototype of the application displays the progress of this process to the user during the metadata verification and file upload function. Downloaded CVE data files are in ZIP format archives, which are placed in the created “nvd” directory.
After downloading the CVE data files from NVD, the extracted function is initiated. During this activity, the CVE data in JSON format are extracted from the downloaded archive. The archived JSON files are placed in the created “data” directory.
After the file extraction function is completed, the function of reading JSON files and converting the CVE data contained in them to the Jess ES knowledge base is initiated. During this phase, only those CVE records that have CVSS V2 estimates with the specified vulnerable software version (CPE value) are sampled when building the Jess ES knowledge base. That is, if the scanned CVE record does not have a CVSS V2 estimate, it is rejected, and if a CVE record has a CVSS V2 estimate but does not have specified CPE fields, the record is also rejected and not included in the forming ES knowledge base. During data conversion, data from CISA’s Known Exploited Vulnerabilities Catalog are also downloaded and loaded, with which it is checked whether the converted CVE record is included in this catalog; if it is—the CVE record is marked accordingly. Even during the conversion of CVE data, vulnerable software information is extracted from CPE data contained in CVE: manufacturer, name, and version.
After the program prototype completes the conversion function, information about the total number of CVE records read, the total number of CVE records converted to the ES knowledge base, and the total number of CVE records rejected due to the specified conditions are displayed to the user. Also, for the convenience of the user, a Jess ES data template is provided, which the user can copy and use for data description before importing the automatically generated CVE data facts into the ES knowledge base.
The file “cve_jess_kb.dat” created by the program prototype is an automatically formed ES knowledge base, which contains information about known software vulnerabilities (CVEs) and their known exploitation in public space (see Figure 5).
Figure 5. Prototype in action: automated conversion of CVE records into a knowledge base.

4.2. Prototype Performance Evaluation

Three tests were conducted with the developed prototype of the program, during which the accuracy and performance of the program were tested. To evaluate the accuracy of the program prototype, the amount of CVE records read by the prototype and the amount of CVE records converted to ES knowledge were checked, and these amounts were compared with the raw CVE data from NVD. Raw CVE data were analyzed by uploading them to Elastic Stack and the number of CVEs being checked against the conditions raised. To evaluate the performance of the prototype, CVE was measured in data conversion times, including data downloads.
During the tests, the actual number of CVE entries in the NVD was checked, which is compared with the CVE entries read by the developed program prototype. Next, the actual target number of converted CVE records according to the set conditions was checked against the actual number of converted data of the program prototype. During the tests, the conversion time of the prototype CVE records to the Jess ES knowledge base was measured, including downloading data from the CVE database provided by the NVD. The results of all tests performed on the program prototype are summarized in Table 2.
Table 2. Test results of the developed prototype.
It is observed that the developed prototype works correctly—it successfully reads all CVE records and converts all CVE records according to the set conditions. Also, the prototype program works efficiently because it converts CVE records directly from NVD to Jess ES knowledge.
For further investigation, three tests of knowledge import quality were performed to evaluate the correctness of the data. During the tests, an attempt was made to import the automatically generated knowledge base into Jess and check the number of converted facts with the number of actual imported facts. Also, during the tests, the time of importing the automatically generated knowledge into Jess was measured. The results of the tests performed with the automatically formed ES knowledge base are summarized in Table 3.
Table 3. Attempts to import data into the ES.
The experiments have revealed that all the automatically generated knowledge is successfully imported and read in Jess ES—the converted data show 100% correctness.

5. Conclusions

The research into the existing automated methods for building expert system knowledge bases in the field of information security risk management has revealed the potential to convert CVE data into the knowledge base of expert systems. By supplementing the knowledge base of the expert system with CVE data, it is possible to assess the risks posed by the software used for information security.
There are several fields of application used by other researchers, focusing on mapping the CVE vulnerability descriptions to certain security frameworks and facilitating natural language processing models to automate certain data transformations, but our approach focuses more on the data transformation of CVE data to expert system rules.
An analysis of CVE sources has revealed the NVD’s data source suitability for the development of the knowledge base from CVE data. Analysis of the CVE data provided by NVD found that not all CVE records have CVSS estimates. Also, it was found that CVSS version 2 has nearly twice as many records as CVSS version 3 and that CVSS version 3 records also have CVSS version 2 estimates. An additional source of CVE data for the emerging knowledge base of expert systems was selected—the CISA Known Exploited Vulnerabilities Catalog, which provides additional information about the importance of the vulnerability and the probability of its exploitation. After analyzing the CVE sources and the data they provide, a new automated method is proposed, which automatically converts CVE data from NVD with CVSS version 2 data into the knowledge base of the expert system and marks those CVE records that are included in the CISA Known Exploited Vulnerabilities Catalog.
A program prototype has been created in the Python programming language that implements the proposed method idea. After the experiments, it was found that the prototype efficiently and successfully transforms 100% of selected CVE data, and the formed database includes more than 175 thousand records about vulnerabilities.
The use of the CVE data-converting method for the formation of the knowledge base of expert systems for information security risk analysis is superior to other existing methods in that this method uses continuously updated sources, thus ensuring the actuality of the knowledge base of the expert system without additional user effort, and with the knowledge base formed automatically by this method, it is possible to assess the risk posed by the software used for information security.

Author Contributions

Conceptualization, D.V. and N.G.; methodology, D.V. and A.Č.; software, D.B.; validation, D.B. and D.V.; formal analysis, N.G.; investigation, D.B.; data curation, D.B.; writing—original draft preparation, J.J. and D.B.; writing—review and editing, N.G. and A.Č.; supervision, D.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data available in a publicly accessible repository. Available online: https://github.com/dvitkus/CVE2JESS/ (accessed on 27 May 2024).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kühn, P.; Relke, D.N.; Reuter, C. Common vulnerability scoring system prediction based on open source intelligence information sources. Comput. Secur. 2023, 131, 103286. [Google Scholar] [CrossRef]
  2. Dawson, M.; Bacius, R.; Gouveia, L.B.; Vassilakos, A. Understanding the challenge of cybersecurity in critical infrastructure sectors. Land Forces Acad. Rev. 2021, 26, 69–75. [Google Scholar] [CrossRef]
  3. Hernandez, Z.; Hernandez, T.H.; Velasco-Bermeo, N.; Monroy, B. An expert system to detect risk levels in small and medium enterprises (SMEs). In Proceedings of the Fourteenth Mexican International Conference on Artificial Intelligence (MICAI), Cuernavaca, Mexico, 25–31 October 2015. [Google Scholar]
  4. Lee, Y.; Woo, S.; Song, Y.; Lee, J.; Lee, D.H. Practical vulnerability-information-sharing architecture for automotive security-risk analysis. IEEE Access 2020, 8, 120009–120018. [Google Scholar] [CrossRef]
  5. Azzazi, A.; Shkoukani, M. A Knowledge-based Expert System for Supporting Security in Software Engineering Projects. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 395–400. [Google Scholar] [CrossRef]
  6. Atymtayeva, L.; Kozhakhmet, K.; Bortsova, G. Building a knowledge base for expert system in information security. Adv. Intel. Syst. Comput. 2014, 270, 57–76. [Google Scholar] [CrossRef]
  7. Tripathi, K.P. A review on knowledge-based expert system: Concept and architecture. IJCA Spec. Issue Artif. Intell. Tech. -Nov. Approaches Pract. Appl. 2011, 4, 19–23. [Google Scholar]
  8. Colson, A.R.; Cooke, R.M. Expert elicitation: Using the classical model to validate experts’ judgments. Rev. Environ. Econ. Policy 2018, 12, 113–132. [Google Scholar] [CrossRef]
  9. Tecuci, G.; Marcu, D.; Boicu, M.; Schum, D.A. Knowledge Engineering: Building Cognitive Assistants for Evidence-Based Reasoning; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
  10. Ogu, E.C.; Adekunle, Y.A. Basic Concepts of Expert System Shells and an Efficient Model for Knowledge Acquisition. Int. J. Sci. Res. 2013, 2, 554–559. [Google Scholar]
  11. McGoo Software. ES-Builder Web Expert System Shell. Available online: http://www.mcgoo.com.au (accessed on 15 January 2024).
  12. Frederiksen, B. Applying Expert System Technology to Code Reuse with Pyke. In Proceedings of the PyCon, Birmingham, UK, 12–14 September 2008. [Google Scholar]
  13. Wen, Q. Drools Rules Engine Used in Management Accounting System Design Research. In Proceedings of the 4th International Conference on Management Science and Engineering Management (ICMSEM 2023), Nanchang, China, 2–4 June 2023. [Google Scholar]
  14. Riley, G. Adventures in Rule-Based Programming: A CLIPS Tutorial; Secret Society Software, LLC: AZ, USA, 2022. [Google Scholar]
  15. Yurin, A.Y.; Dorodnykh, N.O. Personal knowledge base designer: Software for expert systems prototyping. SoftwareX 2020, 11, 100411. [Google Scholar] [CrossRef]
  16. Orbst, L.; Chase, P.; Markeloff, R. Developing an Ontology of the Cyber Security Domain. In Proceedings of the Seventh International Conference on Semantic Technologies for Intelligence, Defense, and Security, Fairfax, VA, USA, 23–26 October 2012; pp. 49–56. [Google Scholar]
  17. Sicilia, M.A.; Garcia-Barriocanal, E.; Bermejo-Higuera, J.; Sanchez-Alonso, S. What are information security ontologies useful for? Commun. Comput. Inf. Sci. 2015, 544, 51–61. [Google Scholar]
  18. Fenz, S.; Plieschnegger, S.; Hobel, H. Mapping information security standard ISO 27002 to an ontological structure. Inf. Comput. Secur. 2016, 24, 452–473. [Google Scholar] [CrossRef]
  19. Ramanauskaite, S.; Olifer, D.; Goranin, N.; Čenys, A. Security ontology for adaptive mapping of security standards. Int. J. Comput. Commun. Control 2013, 8, 878. [Google Scholar] [CrossRef]
  20. Vitkus, D.; Salter, J.; Goranin, N.; Čeponis, D. Method for attack tree data transformation and import into risk analysis expert systems. Appl. Sci. 2020, 10, 8423. [Google Scholar] [CrossRef]
  21. ISO/IEC 27001:2005; Information Technology—Security Techniques—Information Security Management Systems—Requirements. International Organization for Standardization: Geneva, Switzerland, 2005.
  22. PCI DSS 3.2.1; Payment Card Industry Data Security Standard. PCI Security Standards Council: Wakefield, MA, USA, 2018.
  23. ISSA 5173; The Security Standard for SMEs. 2|SEC: London, UK, 2012.
  24. NISTIR 7621; Small Business Information Security. The National Institute of Standards and Technology: Gaithersburg, MD, USA, 2016.
  25. Kopena, J.; Regli, W.C. DAMLJessKB: A Tool for Reasoning with the Semantic Web. IEEE Intell. Syst. 2003, 18, 74–77. [Google Scholar] [CrossRef]
  26. Meditskos, G.; Bassiliades, N. DLEJena: A practical forward-chaining OWL 2 RL reasoner combining Jena and Pellet. J. Web Semant. 2010, 8, 89–94. [Google Scholar] [CrossRef]
  27. Vitkus, D.; Steckevičius, Ž.; Goranin, N.; Kalibatienė, D.; Čenys, A. Automated expert system knowledge base development method for information security risk analysis. Int. J. Comput. Commun. Control 2019, 14, 743–758. [Google Scholar] [CrossRef]
  28. Grigorescu, O.; Nica, A.; Dascalu, M.; Rughinis, R. CVE2ATT&CK: BERT-Based Mapping of CVEs to MITRE ATT&CK Techniques. Algorithms 2022, 15, 314. [Google Scholar] [CrossRef]
  29. Manjunatha, A.; Kota, K.; Babu, A.S. CVE Severity Prediction From Vulnerability Description—A Deep Learning Approach. Procedia Comput. Sci. 2024, 235, 3105–3117. [Google Scholar] [CrossRef]
  30. Dodiya, B.; Singh, U.K.; Gupta, V. Trend analysis of the CVE classes across CVSS metrics. Int. J. Comput. Appl. 2021, 183, 23–30. [Google Scholar] [CrossRef]
  31. Czarnowski, I. A framework for the clustering and categorization of CISA reports. Procedia Comput. Sci. 2022, 207, 4369–4377. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.