1. Introduction
Today, all commercial and individual applications have turned into web-based applications due to the ease of remote access and updating. Many applications developed as desktop applications have also started to support web-based solutions. Backend and frontend processes are involved in the transformation of web-based applications into products, and these stages are usually developed by different developers. The backends of web applications are developed with a programming language and the frontends are developed with a framework. A web application has at least a three-layered architecture and is the result of project management [
1].
Vulnerability refers to situations where a system or software is vulnerable or lacking in security. Vulnerabilities are weaknesses that malicious actors or attackers can use to take over the system, access data, cause service interruptions or cause other damage [
2]. Software-based vulnerabilities are situations that allow attackers to exploit the system due to errors or deficiencies in the code or design of the software [
3]. Web-based software comprises applications that can be accessed and used over the internet. The web-based use of many applications brings potential risks.
In web applications, risks and threats are usually uncovered during testing phases after the project is completed [
4]. Such software can be vulnerable to various security risks. Especially in web-based software systems, SQL injection, XSS (cross-site scripting), CSRF (cross-site request forgery), sensitive data exposure, XML parsing vulnerabilities, insecure file uploads, insecure authentication and authorization, insecure connection and communication and insecure routing and communication are common vulnerability points [
5,
6,
7]. Penetration tests are commonly preferred to test vulnerabilities. Penetration tests reveal the existing vulnerabilities of the web application [
8].
Common Vulnerabilities and Exposures (CVE) represents processes to identify, describe and catalog cyber vulnerabilities that have been identified and publicly disclosed. CVE is a vulnerability database in which each vulnerability contains a unique identifier code along with a description of the vulnerability and other useful information [
9]. In the CVE database, vulnerabilities are shared on a scale of 1–10 to determine the severity of each vulnerability. The scaling of vulnerabilities in the CVE database between 1 and 10 is called the Common Vulnerability Scoring System (CVSS) [
10]. CVSS assigns scores out of 10 to vulnerabilities using certain algorithms and metrics. Vulnerabilities are then grouped according to their scores. These groups are formed according to the importance of the vulnerabilities they contain. CVSS then groups vulnerabilities into two different versions, CVSS v2.0 and CVSS v3.0 [
11].
Vulnerability detection systems need to be continuously expanded and improved in order to instantly discover new threats. There are many important criteria to assess the challenges of detection to protect the security of applications and systems as the scale, time, technical and computing requirements of applications increase. This emphasizes the need to quickly and clearly identify vulnerabilities in complex systems. In addition, it can be difficult to strike a balance between true positives and false positives when optimizing coverage. Increasing the sensitivity of scanning tools both increases measurements and overwhelms security teams with alerts [
12]. Reducing the sensitivity and power has the opposite effect, giving the application a false sense of security. Finally, attackers have started to exploit multiple vulnerabilities in a system in parallel or in a chain. These are much harder to detect, and a few low-risk vulnerabilities alone may not trigger an alarm on any system. All these issues and requirements reveal the many advantages of evaluating vulnerability detection at the source-code stage. In line with this, we propose VULREM, a BERT-based vulnerability scanning tool that performs vulnerability scans of the backend source code of web applications. The contributions and innovations of the VULREM model are as follows:
- -
- VULREM is a C#-based tool for the backend of web applications that can perform static source-code analyses based on eight different vulnerabilities. 
- -
- The original study-specific dataset was used to train and test the VULREM model. 
- -
- VULREM uses a fine-tuned BERT model on a pre-trained BERT. 
- -
- According to performance evaluations, VULREM achieved a high accuracy in vulnerability extraction, with an F1-score of 96% and a Matthew’s correlation coefficient (MCC) of 83%. 
In the following sections of the study, we first review the literature on source-code-based vulnerability detection and evaluate its originality. Then, the development process of the VULREM model is explained step-by-step and the model is tested. The findings are compared and discussed with the existing literature and the performance is evaluated. In the last step, the results and future work of the VULREM model are briefly explained.
  2. Related Works
In this section of the study, approaches that detect vulnerabilities through source code are examined. In source-code vulnerability analyses, language-based learning algorithms are generally focused on.
Xiaomeng et al. utilized deep learning methods to analyze source code based on a code feature graph. The study used the Software Assurance Reference Dataset (SARD), a publicly available dataset of C/C++ instruction injections. The femeasure, precision, false-positive rate, true-positive rate and false-negative rate of the proposed deep learning code property graph-based vulnerability analysis (CPGVA) method were calculated [
13]. Sahin and Abualigah used a real-world SA dataset based on three open-source PHP applications to detect vulnerabilities in open-source software programs and to detect malware. A new deep-learning-based vulnerability detection model was proposed to identify features [
14]. Jeon and Kim proposed a deep-learning-based automatic vulnerability analysis system (AutoVAS) that effectively represented source code as embedding vectors using datasets from various projects in the National Vulnerability Database (NVD) and Software Assurance Reference Database (SARD) [
15].
In their study, Hegedus et al. examined how machine learning techniques performed when predicting potentially vulnerable functions in JavaScript programs. For this research, they applied eight machine learning algorithms to build prediction models using a new dataset generated from vulnerability information in the public databases of the Node Security Project and the Snyk platform, as well as code fix patches on GitHub. They used static source-code metrics as predictors and an exhaustive grid-search algorithm to find the best performing models. The KNN machine learning algorithm showed the most successful results [
16]. Zhang presented a machine learning classifier designed to detect SQL injection vulnerabilities in PHP code. Both classical and deep-learning-based machine learning algorithms were used to train and evaluate classifier models using input validation and sanitization features extracted from source-code files. In a ten-fold cross-validation, a model trained using a convolutional neural network (CNN) achieved the highest precision (95.4%), while a multilayer perceptron (MLP)-based model achieved the highest recall (63.7%) and the highest f-measure (0.746) [
17].
Kim et al. aimed to detect software vulnerabilities using a BERT-based model. As a method, vulnerabilities were classified more accurately by enabling the model to understand the syntactic and semantic aspects of the code. According to the numerical results obtained, the model performed with an F1-score of over 95% and high accuracy values. It was also reported that the model gave lower false-positive and -negative results compared with traditional methods. This reveals that BERT is an effective method to detect software vulnerabilities [
18]. Huang et al. focused on the detection of software vulnerabilities. The BBVD approach aims to automatically detect vulnerabilities using BERT-based models, treating high-level programming languages as natural languages. This approach provides more efficient results when traditional methods cannot effectively scan large amounts of source code. Experimental results showed that BBVD performed well in tests using SARD and Big-Vul datasets [
19]. Quan et al. introduced XGV-BERT, a deep-learning-based software vulnerability detection framework. XGV-BERT detected software vulnerabilities by combining a pre-trained CodeBERT model and graph neural network (GCN). The model learned the contextual relationships of source-code attributes by exploiting large datasets and using transfer learning. The results showed that XGV-BERT provided a higher accuracy than existing methods such as VulDeePecker and SySeVR, with an F1-score of 97.5% for VulDeePecker and 95.5% for SySeVR [
20]. Zhu et al. aimed to detect types of software vulnerabilities. Using a BERT-based model and four different slicing methods, vulnerabilities in software code were classified in detail. The study represented code slices with four slicing methods, including API calls, arithmetic expressions, pointer usage and array usage. According to the experimental results, the BERT-based model outperformed other methods such as VulDeePecker and SySeVR, with a higher accuracy and F1-score (93.3%) [
21].
C and C++ languages are widely used in the literature. First of all, the fact that today’s applications are web-based and the majority of data vulnerabilities are not experienced in web-based applications motivated our study to scan for web-based vulnerabilities. In addition, as there is no web-based application development with C and C++ languages, we did not focus on these languages. When the literature studies are examined, it can be seen that vulnerability detection is more common for open-source languages and applications. We focused on languages that are shared on open-source code-sharing platforms such as GitHub and that have critical vulnerabilities. With this motivation, in our study, we first focused on the C-Sharp language, which is one of the most widely used languages in web applications. The vulnerabilities used in the literature have received CVE numbers, but it can be seen that these were detected in a very old time period. Currently, there is a possibility that these vulnerabilities have been fixed or removed due to version updates. Therefore, in this study, we focused on current vulnerabilities in particular. The reason we chose this language is that the applications developed at our university and which constitute the dataset of the study use C-Sharp and it is widely used in the backend of web applications. In particular, the fact that the number of critical vulnerabilities experienced in the scripting languages on the frontend side is low and because data vulnerabilities can be corrected in the backend side limited the study only to the backend.
  3. Vulnerability Detection Methods
A vulnerability is a term, also known as a weakness, that refers to a situation in which the protection methods and techniques of a system or application are weak or incomplete. These vulnerabilities can allow malicious actors to gain unauthorized access to the system or unintended use of the service. Security problems can be caused by many different factors, including the following:
- -
- Software bugs; 
- -
- Design flaws; 
- -
- Weak encryption methods; 
- -
- Poor configuration; 
- -
- Lack of updates and patches; 
- -
- User errors; 
- -
- Social engineering; 
- -
- Factors such as malware infiltration, which can contribute to the emergence of vulnerabilities [ 22- ]. 
Likewise, vulnerabilities can be detected by different methods such as vulnerability scans, penetration tests, source-code analyses, log analyses, social engineering tests and audits. All these methods are basically characterized as vulnerability analyses. To detect security problems, the following three basic features must be present in the scanning process [
23]:
- -
- Discovery: The process of providing a snapshot or continuous view of the assets within the target system and application. 
- -
- Assessment: The detection of anomalies or known vulnerabilities registered in the CVE database in the target system and application. 
- -
- Prioritization: The prioritization of assets within the target system and application using CVE data, data science or threat intelligence to resolve vulnerability results. 
Almost all applications developed recently provide web-based services. Finding vulnerabilities in web applications is a difficult and comprehensive process. Modern web applications are created with a wide network of third-party frameworks, services and APIs. All of this scope expansion can lead to the discovery of new vulnerabilities or exacerbate existing vulnerabilities, making vulnerability detection even more difficult. At the same time, the development of comprehensive web applications using a development process with many developers has popularized Agile and DevOps approaches [
24]. As a result, rapid version releases and enhancements have increased, and unintended security vulnerabilities have also emerged through these approaches.
  3.1. Manual Application Security Testing (MAST)
This is known as the processes completed by the evaluation of security experts instead of automatic scanning software in the processes of detecting vulnerabilities in applications and systems. The expert who conducts the test finds the vulnerabilities and determines an importance assessment. At the same time, they also give advice on solving the vulnerabilities detected. In MAST methods, the knowledge and experience of the expert is very effective, especially in the context of business/process logic errors, access control issues and other complex security threats. In some cases, the process detects attack and vulnerability routes that automated scanning tools overlook [
25]. MAST is effective in penetration testing, threat modeling, source-code reviews and security-oriented design reviews. However, due to the increasing complexity of modern web applications and the high speed of software development, MAST methods can be difficult to implement.
  3.2. Static Application Security Testing (SAST)
SAST is a method that enables the automatic detection of vulnerabilities at a very early stage of the software development lifecycle (SDLC) because it does not require an application to run and occurs without code execution. It is an important process because the vulnerabilities detected during the development phase do not have any critical consequences and there is an opportunity to fix them [
26]. SAST processes use state-of-the-art analysis methods such as data flow analysis, control flow analysis and flaw analysis to detect vulnerabilities. Initially, SAST methods focused on input validation and memory overflow vulnerabilities. More recently, SAST has been based on development language and platform-dependent parsers, decoders and syntax tree generators, limiting the types of projects they can be used for [
27].
  3.3. Dynamic Application Security Testing (DAST)
Unlike SAST methods, DAST methods evaluate behaviors and program responses with different usage scenarios to detect potential vulnerabilities without providing access to the source code. Thus, DAST methods help to identify vulnerabilities arising from the application runtime environment. The first-use cases of DAST were mostly for online applications such as SQL injection, cross-site scripting and weak authentication [
28]. With the inclusion of modern software in comprehensive development processes such as APIs and mobile applications, DAST methods have expanded to include this development. At the same time, firewalls, intrusion detection, configuration errors, interactions of application components and access control problems are among the tools to find risks and threats that can be overlooked in static analysis. Recently developed DAST tools are supported by artificial intelligence models to reduce false positives and anomalies suggestive of vulnerabilities or attacks.
  3.4. Interactive Application Security Testing (IAST)
The sophisticated coverage network of modern applications can reveal disadvantages in both SAST and DAST methods. Building on the advantages of SAST and DAST methods, IAST techniques provide a hybrid method that analyzes applications during both the test phase and runtime [
29]. By integrating a static code analysis with a dynamic runtime analysis, IAST methods minimize false positives and false negatives compared with SAST and DAST. With current AI-supported IAST methods, IAST tools are included in processes such as integrated development environment (IDE), build systems and continuous integration and continuous delivery (continuous integration (CI)/continuous development (CD)) to understand the processes of software and produce effective results [
8].
  4. Proposed Methodology
Introduced in 
Section 3, vulnerability detection methods such as SAST, DAST and IAST are traditional vulnerability detection strategies used in web applications. These methods are often enriched with manual and dynamic analyses but, given the complexity of modern web applications, these processes are time-consuming and error-prone. The VULREM model was developed to address the shortcomings of these methods and improve security by detecting vulnerabilities at the source-code stage. Our model combined the advantages of early-stage detection methods such as static code analysis (SAST) with a deep-learning-based solution. Thus, the difficulties and error rates encountered in manual processes were significantly reduced. In particular, a direct inspection of the source code allowed for the proactive detection of vulnerabilities that may have been missed in dynamic analyses.
In the proposed model, the improved BERT model was used to perform vulnerability scanning over the source code of web applications and to detect identified CVE-coded vulnerabilities. The BERT method is a new language representation model that uses a bidirectional transducer network to pre-train the language model on a corpus and fine-tune the pre-trained model for other tasks [
30]. The problem-specific BERT design can be sequentially represented as a single line of code or a block of code. The input representation is constructed by collecting token, segment and position code fragment embeddings corresponding with a given code [
31]. At the same time, vulnerability prediction in the BERT model is bidirectional, both left-to-right and right-to-left. The development architecture of the proposed model, named VULREM, is shown in 
Figure 1. In the following sections of the paper, the development and performance evaluation of the VULREM model are presented.
  4.1. Dataset
A study-specific dataset was created during the development process of the VUMREM model. The C-Sharp programming language is used when developing web applications with ASP.NET and ASP.NET Core. Web applications developed with C-Sharp language are rarely shared as open-source. Due to these limitations, the source codes of the applications developed at our university’s IT department were labeled according to the vulnerability descriptions in the Common Vulnerability and Exposures (CVE) and National Vulnerability Database (NVD) archives. If there was a vulnerable function in the source code, we noted in which file and software it was located and labeled it as vulnerable, and we labeled the code blocks that did not have any vulnerabilities as non-vulnerable. 
Table 1 shows the distribution of the vulnerable and non-vulnerable data we obtained from the preferred web applications. In the development of the VULREM model, the source code of six different C-Sharp-based web applications were labeled according to CVE vulnerabilities. In 
Table 1, the number of lines of source code examined in the application is “Samples”, the number of lines of source code with a vulnerability is “Vuln Code” and the ratio of the total number of lines is “Vulnerable Rate”.
In the obtained dataset, there were 2,141,783 lines of source-code samples, and 17,128 lines of source code had CVE-numbered vulnerabilities. There were eight different types of vulnerabilities in the vulnerability scanning of the VULREM model, and the number of samples in the dataset for these vulnerabilities is given in 
Table 2.
  4.2. Token Embeddings Phase
The BERT model of the proposed approach used a structure called WordPiece to partition the input source-code blocks into word tokens [
34]. The WordPiece structure divides each input source code into subwords called tokens. As the midpoint between words and characters, word particles retain their linguistic meaning. Thus, even with a small-sized vocabulary, it can be successful in the case of out-of-vocabulary word fragments (
Figure 2).
In the first process for source-code vulnerability detection in the proposed approach, code slices were extracted from the source-code files using the tokenizer training phase. Code fragments are candidates for syntax- and semantics-based vulnerabilities in particular. The open-source code-analysis tool joern was used to parse the code slices and source-code files and to extract the control flow graph corresponding with each code. In addition, user-defined variable and function names were mapped to symbolic names by preserving the original name of the keywords in the source-code language.
  4.3. Transformer
In the proposed approach, the transformer architectures used for the BERT model were a natural language processing (NLP) structure that included an attention mechanism and an encoder–decoder structure that weighed the effect of different parts of the input source code on the sentence [
35]. The encoder–decoder structure can be understood by explaining the working structures of the attention mechanisms [
36]. The transformer architecture included input and output encoders, attention mechanisms, feed-forward networks and the SoftMax function for optimal probability computation. The tokenization and word representation part was performed in the input embedding part, which was the first processing element of the encoder. Here, a vector matrix was obtained after adding the encoded positions of the numeric word representations to the generated numeric word representations. This matrix array was processed in an attention mechanism to determine the relationships between the words in the sentence and to determine the probability of them being related. In this way, the BERT model performed contextual language detection tasks.
Attention mechanisms determine the level of association between input tokens by treating the currently processed token as a query, all previously queried tokens as a key and tokens in the sentence as vector matrices with a named value, as given in Equation (1) [
37,
38].
        
The value d_k represents the number of dimensions of the key matrix. The Q matrix is multiplied by the transpose of the K matrix. A new vector matrix is obtained by dividing this matrix product by 
. The multiplication of this vector matrix by the matrix V with γ_softmax indicates the association rate in the group of tokens for each Q value [
39]. Here, the function σ_softmax represents a probabilistic function indicating the level of association of the input values. σ_softmax produces a value between 0 and 1, no matter whether the input elements are positive or negative. The values given by σ_softmax are the probability score of the association.
        
Here  represents the input vector of σ_softmax, zi represents the i-indexed element of the input vector zi and  represents the exponential function applied to each input.  ensures that all inputs are reduced to 0–1 and the output is 1.
AutoModel and AutoTokenizer models from Python libraries were added to the proposed model approach to implement the above transformer structure.
  4.4. Model Implementation
As the BERT model in the proposed VULREM approach could undertake more than one task, the task type needed to be specified as a classification. During training and testing, the first token ([CLS]) of each input sequence was fixed as a special classification label. The output layer of the transformer (C ) then used it as a sequence representation to perform the classification. Here, H is the hidden size.
When fine-tuning the BERT algorithm in the VULREM model, 
 was added, where K is the number of three CVE-coded vulnerabilities detected in the model. The exit probabilities for each K class were calculated, as in Equation (3).
        
        where 
 represents the probabilities of the classification labels. However, in the BERT model, the pre-trained parameters, the uncased model and the parameters in the classification were jointly fine-tuned to maximize the probability value corresponding with the correct vulnerability. For the training of parameters, the Adam optimizer algorithm, which is also proposed by Google, is preferred [
40]. The Adam algorithm has the ability to compute adaptive learning values for each input parameter [
41]. It is also known as a combination of an Adam optimizer, RMSprop and a traditional stochastic gradient descent with momentum [
42]. The Adam optimizer used in the BERT model in the proposed approach updated the parameter values according to Equations (4) and (5).
        
        where r_t and v_t are the estimate and variance, respectively, and the decay rate is 
. The errors in the 
 and 
 moments were corrected according to Equation (6).
        
        where 
 and 
 are the corrected versions of 
 and 
, respectively. All these values were used to obtain the value of W, as shown in Equation (8).
        
        where η represents the learning rate and ϵ represents smoothing.
  5. Result and Discussion
Computational studies to test and explore the performance of REMBERT, a vulnerability scanning tool in proposed web applications, were performed using a computing system with NVIDIA RTX 4080 GPU, 64 GB system RAM and INTEL 16 processor cores. During the development process of REMBERT, the python programming language was used and pytorch and BERT libraries were preferred.
  5.1. Fine-Tuned BERT
A fine-tuned BERT approach was proposed for the VULREM model developed for vulnerability scanning in web applications. The VULREM model was based on 24 transformer layers, 16 attention heads and 512 hidden layers. The fine-tuned hyperparameters used in the training of the VULREM model are given in 
Table 3. In the training of the VULREM model, a batch size of 32 was used for 64, 128, 256 and 320 sequence lengths. In order to avoid memory problems, the batch size was set to 16 and 8 for 384 and 512 sequence lengths, respectively.
  5.2. Bag-of-Words-Based Approaches
In order to evaluate the performance of the proposed fine-tuned BERT model, it was trained with the following three text classifiers: k-NN, naïve Bayes (NB) and support vector machine (SVM). Textual features extracted from the source code using the bag-of-words model and term frequency–inverse document frequency (TF-IDF) were used to train k-NN, NB and SVM.
The creation of the unigrams required to train the models went through the steps of case conversion, symbolization, filter stop words and stemming, respectively. The preprocessing steps resulted in hyperparameters that were optimized using grid searches to train the models (
Table 4).
  5.3. Evaluation Metrics
Performance metrics were used to evaluate the output performance of the model. Confusion matrix, accuracy, recall, precision and F1-score metrics were preferred to evaluate the performance of the proposed model. The purpose of using the preferred metrics in the model is briefly explained below.
Accuracy: This is a metric that produced a percentage of how many correct results the proposed model produced for the whole dataset (Equation (8)).
Recall: This permitted the determination of how many of the outputs that the proposed model should have positively predicted were correctly predicted (Equation (10)).
Precision: This was used to measure the performance of the accuracy of the proposed model’s positively correct predictions (Equation (11)).
F1-score: Although precision and accuracy metrics were used for the performance of the proposed model, it was also verified using this metric, which gave a more precise result and was a combination of other metrics (Equation (12)). The F1-score provided more sensitive evaluations for an accurate evaluation, especially if the distribution of some vulnerability clusters in the dataset was not equal.
Matthew’s correlation coefficient (MCC): This produced a value between −1 and 1 to evaluate the performance of the model in multiple vulnerability classifications within the VUMREM model. The closer this value was to 1, the higher the interpreted classification success (Equation (13)).
        
The explanation of the abbreviations of the metrics used in Equations (1)–(5) are as follows:
- -
- True Positive (TP): When data were correctly predicted to be vulnerable or non-vulnerable. 
- -
- True Negative (TN): The prediction that a non-vulnerable code block was not vulnerable. 
- -
- False Positive (FP): The prediction that a non-vulnerable code block was vulnerable. 
- -
- False Negative (FN): A prediction state that indicated a vulnerable code block to be not vulnerable. 
- -
- k: The number of classes in the model labelled as vulnerable and non-vulnerable. 
- -
- : The number of correct predictions of those labelled as vulnerable and not vulnerable. 
- -
- : The total dataset was the number. 
- -
- : The number of predictions of k classes in the dataset. 
- -
- : The number of correct recognitions of class k in the dataset. 
  5.4. VULREM Performance Assessment
The vulnerability classification accuracies of the proposed model and bag-of-words-based models were compared, as shown in 
Table 5. All models of the proposed VULREM with different lengths showed higher vulnerability classification performance than the bag-of-words models. At the same time, the VULREM classification model was presented with inputs of different string lengths according to the number of source-code lines, and the highest accuracy performance was achieved at a string size of 320.
In the dataset obtained from six different web applications specific to the study, there was a total of eight different CVE-numbered vulnerabilities. The distribution of the sample numbers of these vulnerabilities was not homogeneous. Therefore, in the training and testing phase of the model, it was seen that there was a change in the F1-score values in direct proportion to the number of samples, as given in 
Figure 3, but the accuracy changes were very small.
In the test phase of the VULREM source-code scanning tool, scanning time performance was evaluated along with accuracy performance. According to the results given in 
Table 6, the model had a detection time that increased in direct proportion to the number of vulnerabilities detected in the source code. In the test phase of the dataset obtained from six different web applications, a maximum of 20 vulnerabilities were detected and the total detection time of these vulnerabilities was measured to be 12.4 s maximum. In terms of continuous vulnerability testing in software development processes, this time is acceptable.
  6. Conclusions and Future Work
The development and intensive use of many web-based applications brings data security risks. Many different languages and frameworks are used in the development process of web applications. There are many alternative ways to detect vulnerabilities in these web applications. Many security tests are performed in live or test environments after the application development process and if there are possible risks, serious time is required for correction. Therefore, web applications should be developed according to the clean code philosophy and vulnerabilities should be evaluated at the source-code stage. In line with this goal, the proposed VULREM approach aimed to detect eight different types of vulnerabilities common in ASP.NET and ASP.NET-Core-based web applications developed using the C-Sharp language at the source-code stage. The VUMREM vulnerability tool was based on the BERT model and performed vulnerability scanning using a dataset specifically created for the study. The VUMREM model could use 64, 128, 256, 320, 384 and 512 sequence lengths, according to the source code to be scanned for vulnerability. The main model performed the CVE vulnerability-type classification by going through the token, position and segment embedding stages in a BERT model improved according to a dataset specific to the study. The VUMREM model was evaluated using different metrics in the training and testing phases, and showed the highest performance among similar language-based learning algorithms, with an average accuracy of 96%. The source-code vulnerability detection times varied directly proportional to the number of lines of source code and the number of vulnerabilities detected. As a result of the tests, VUMREM classified the vulnerabilities in the dataset within 12 s at the longest, showing that it performed the classification in an acceptable time.
The vulnerability detection performance of the VULREM model offers significant advantages compared with traditional methods. For example, while SAST methods usually focus on specific vulnerabilities such as input validation and memory overflow, the VULREM model covers a wide spectrum of vulnerabilities (e.g., CVE-based vulnerabilities) and successfully detects even those that may be missed by a manual analysis. Likewise, although DAST methods try to detect vulnerabilities at runtime, our VULREM model solved vulnerabilities at the source using a static code analysis, thus minimizing the risks that the application might encounter later. In this context, the 99% F1-score provided by our model showed a significant improvement in terms of both accuracy and speed compared with traditional methods.
In VULREM’s future work, we plan to expand the classification of automated scanning tools by continuing to collect source codes and the vulnerability tagging of open-source or developed web applications. At the same time, the original dataset will be published on platforms such as GitHub and Kaggle to contribute to research.