You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

4 January 2023

Searching Open-Source Vulnerability Function Based on Software Modularization

,
,
,
and
State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China
*
Author to whom correspondence should be addressed.

Abstract

Vulnerable open-source component reuse can lead to security problems. At present, open-source component detection for binary programs can only reveal whether open-source components with vulnerabilities are reused, which cannot determine the specific location of vulnerabilities. To address this problem, we propose BMVul, an open-source vulnerability function detection based on the software modularization method, which is oriented to binary programs. BMVul performs binary modularization by the overlapping clustering method DBM based on directed graph, then uses feature comparison technology to carry out modular software component analysis. After creating open-source component vulnerability function set through function signature, BMVul detects vulnerability function in the binary modules reusing open-source components. The experimental results show that compared with the component detection based on Louvain modularization and B2SFinder, BMVul improves the precision by 3.16% and 59.57%, respectively. Moreover, the precision of unique binary module matching is improved by 39.43% compared with the Louvain method. The F1 score is improved by 8.45% compared to B2SFinder. Module-level detection narrows the search space of vulnerability functions, thereby reducing the workload of open-source vulnerability detection, which is of great significance for software security analysis.

1. Introduction

Open-source code reuse can speed up program development and programs are applying increasingly open-source components. Studies [1,2,3,4] indicated that commercial software also reuses a large number of open-source components, such as libraries in the firmware of the Internet of things and the Linux kernel. However, the reuse of vulnerable open-source components will cause security issues. For example, Heartbleed [5] (CVE-2014-0160) was discovered in version 1.0.1 of OpenSSL [6], which affected millions of software (e.g., LibreOffice [7], VMware [8]) and devices (e.g., routers, switches, firewalls). VULDEFF [9] proposes a vulnerability-detection method based on function fingerprints and code differences. VGRAPH [10] designs an accurate approximate matching algorithm which is capable of detecting modified vulnerable code clones, and differentiating them from their patched counterparts. Similarly, VELVET [11] and Bgnn4vd [12] find vulnerable code reuse in source code.
To detect the reuse of open-source components, researchers proposed a series of methods based on binary-to-source matching. OSSPolice [1] identifies open-source components through strings and exported function names and detects whether the components have certain types of vulnerabilities, to determine whether components reused in the program are vulnerable versions. B2SFinder [2] matches string, integer type, and control flow characteristics with different weight algorithms to detect the reuse of open-source components in commercial software, and analyze potential vulnerabilities of components reused. B2SMatcher [3] uses program-level features for rough matching to determine single-version and multiversion reuse, then uses function-level features for exact matching. Some applications have been found to reuse vulnerable versions of open-source components. Nevertheless, the above methods are limited to only providing module-level information (e.g., reused vulnerable versions of open-source components), without giving the specific location of the vulnerable function in binary.
Usually, it is hard to detect vulnerability in binary executables. Consequently, few methods detect reused binary vulnerability functions. With the source code of the vulnerable open-source components, researchers [13,14] can extract source-level information in open-source components to assist vulnerability detection in binary executables. According to the features of vulnerable functions in open-source components, they search for the reused vulnerable binary functions. However, all binary functions need to be compared with source vulnerability functions to determine the binary vulnerability functions, which is time consuming.
To solve the above problems, we propose an open-source vulnerability detection method based on software modularization, named the binary modularization-based vulnerability (BMVul) function. First, we extract the directed call graph of binary functions. Then, we cluster functions by overlapping community detection technology based on statistical significance OSLOM [15] that integrates modularity and information theory algorithms. Based on modularity, we carry out feature-based software component analysis. According to the modular software component analysis results, we detect vulnerability types and specific vulnerability functions in open-source components through function signature, then search the corresponding binary vulnerability functions in the modules reusing components.
In order to evaluate the effectiveness of BMVul, we collect the dataset from ISRD [16] and ModX [17]. The experimental results show that BMVul significantly outperforms existing methods. The precision of BMVul increases by 3.16% and 59.57% than component detection based on Louvain [18] and B2SFinder, respectively. Moreover, the precision of matching a unique binary module is improved by 39.43% compared to Louvain detection. The F1 score increases by 8.45% over B2SFinder. Module-level function matching greatly reduces the workload of open-source vulnerability function detection and finds vulnerability functions reused in the binary program.
Overall, this paper makes the following major contributions.
  • We propose a binary function clustering method for directed graphs, named binary modularization based on directed graph (DBM), which divides binary modules based on overlapping community detection ideas of OSLOM. Then we carry out modular software component analysis through matching features between binary modules and source code.
  • We accurately locate specific source vulnerability functions in open-source components through function signature technology, then match vulnerability functions in binary modules reusing components, which narrows the scope of open-source vulnerability detection.
  • We implement the open-source vulnerability detection prototype BMVul. The results show that BMVul is superior to the current detection method, which can detect open-source vulnerability functions in binary modules and be of great significance to software security work.
The rest of this paper is organized as follows. Section 2 describes the research status in related fields. We introduce the proposed model BMVul in Section 3. In Section 4, we evaluate the performance of BMVul in open-source component detection and vulnerability function detection compared with the state of the art work. It is concluded in Section 5.

3. Methodology

BMVul is a method that detects open-source vulnerability functions based on software modularization. We divide BMVul into the following two stages (Figure 1):
Figure 1. Workflow of BMVul.
  • Software component analysis based on program modularization. In the software component analysis stage, we extract the directed function call information of the binary program and represent the information in the form of a graph. Then, we cluster functions based on the overlapping community detection algorithm OSLOM. Next, we extract string-type features and function complex branch sequences of binary modules and open-source components to match each module of a binary program with open-source components. It identifies the corresponding relationship between binary modules and open-source components and narrows the positioning range of components.
  • Open-source vulnerability detection. In the open-source vulnerability detection stage, we create an open-source component vulnerability function set through function signature, which uses function hash and code normalization techniques. Then we search vulnerability functions in the binary module that reuses open-source components.

3.1. Software Component Analysis Based on Program Modularization

3.1.1. Program Modularization Method Based on Directed Graph

Software modularization aims at functions clustering. It performs a cluster analysis on the set of functions so that functions defined in proximity to one another and functions that frequently call one another will belong to the same cluster.
Modularization refers to the process of dividing the program into relatively self-contained components or modules. Typically, each module encapsulates a fundamental set of related functionalities. However, the modularization will be broken during the compilation process because the compiler will merge all functions into one binary file. Faced with such a good deal of functions, it is not convenient for binary code analysis. Therefore, we propose a binary code modularization method to narrow the search task scope in software security analysis.
The current binary modularization work divides modules for undirected graphs without considering overlapping communities. Therefore, given these limitations, we propose an OSLOM-based DBM for directed graphs and overlapping modules. Unlike most binary analysis work, which only requires local analysis, binary program modularization is a global understanding of the program. Therefore, we extract the function call graph (FCG), which is defined as Equation (1),
G = ( V , E , W )
E = ( a , b ) | a , b V
W = W a b | a , b V ,
where V is the set of all functions, E represents the set of calling edges between all functions, which is defined as Equation (2), pointing from the caller to the callee, and W represents the set of edge weights, which is defined as Equation (3). The more call times between functions, the greater the probability that they complete the same functionality, that is, they belong to the same module.
The directed edge call weight from a to b expresses call times from function a to function b, which is defined as Equation (4),
W a b = n a b , i f ( a , b ) E a b 0 , o t h e r w i s e ,
where W a b is the edge weight from function a to function b, n a b is function calls times from a to b, and E a b is the set of calling edges from a to b.
We take the directed function call graph as the input information of module partition and use the DBM method based on the OSLOM idea to divide binary modules. OSLOM algorithm takes significance [31] as a measure to evaluate the clusters (module), which is defined as the probability of finding the cluster in a random null model; that is, in a class of graphs without community structure, the same as the empty model used in modularity optimization [32], indicating the possibility of the community emerging in a randomized network.
The realization of the OSLOM-based DBM method is divided into three steps, as shown in Figure 2.
Figure 2. Workflow of DBM.
Step 1: We use a significance score to detect important modules until they converge. The initial node is a single function. We calculate the probability of adding adjacent nodes to the node, then delete unimportant nodes. We set the convergence threshold to 0.1, which can achieve the best performance.
Step 2: Based on the set of modules in Step 1, we detect the internal structure of modules or possible mergers between modules to find the minimum clustering result.
Step 3: Detecting the hierarchical structure of modules. The above steps form fundamental function clustering results, in which each module becomes a new node. If there are edges between two nodes, a new edge is formed between them, and the edge weight is the sum of the edge weights between them. A new supernetwork emerges again, by that analogy, until the process no longer produces new modules.
The implementation process of OSLOM can integrate various community detection technologies, such as the heuristic method Infomap [33] based on the random walk, the overlapping community detection method Copra [34] based on label propagation, and the Louvain algorithm based on the concept of modularity. The module output of one or more of the above algorithms can be used as input information for DBM to perform subsequent module partitions. The more algorithms the process integrates, the better the final module partitioning.

3.1.2. Software Component Identification

Binary software component identification matches binary code with source code to determine whether the binary reuses open-source components. At present, there are two kinds of comparison ways. One is to detect the similarity between binary code and source code directly. The other is to determine the compilation provenance of the binary program (e.g., optimization level, architecture, compiler), then compiles the source code into binary form and convert it into binary similarity comparison work. However, there are various combinations of compilation configurations in the implementation of the latter, so it is difficult to detect the compilation provenance accurately. In addition, it is hard to implement because the success rate of automatic compilation is low. Therefore, we carry out a feature-based comparison between binary and source code, considering the principle that features exist in binary and source code and are not easily affected by compilation optimization. We select string-type features (e.g., strings, exported function) and complex branching sequences in functions (e.g., if/else, switch/case).
In the phase of similarity detection between a binary module and source code, for string-type features, features are equivalent when the binary module feature is the same as the source code. We match the if/else features by the length of the longest common subsequence, which is equivalent when the size of the longest common subsequence exceeds the threshold. For switch/case, we compare the switch/case in the module with the switch/case unordered list with the default branch in the source code. The thresholds involved in feature matching are determined empirically. When the matching score of features exceeds the corresponding threshold, it is determined that the module reuses the component.

3.2. Open-Source Vulnerability Detection

3.2.1. Open-Source Component Vulnerability Function Detection

Source code preprocessing based on normalization. Most software developers reuse open-source components with code or structural changes, and open-source components are constantly updated to provide better functionality. However, internal and external open-source component changes can lead to syntactic diversity of vulnerable code. We can address the syntactic diversity problem of vulnerability code clones by using the signature database in Movery [35]. Consequently, this paper uses the vulnerability signature database to perform vulnerability function detection for open-source components.
The signature database is generated by the key techniques of function collision and core code line extraction, including vulnerability signature and patch signature. During the generation of the signature database, essential and dependent vulnerable code lines are extracted to generate extensible vulnerability signatures for addressing syntax diversity caused by internal open-source component modifications. Then, critical lines of code, dependent lines of code, and control flow lines of code are extracted from vulnerability and patch functions to address the syntax diversity caused by external open-source component changes. Finally, the vulnerability signature and patch signature are generated based on extracted contents.
Before vulnerability detection, we generate the signature for the target component through function hash and code normalization. Hash values and the path information of functions will be stored in the function hash file. Then we remove white spaces and comments and convert upper-case characters to lower-case to parse all function code lines. Unnormalized and normalized code line forms are stored in the function signature.
Vulnerability function detection. We compare the target open-source component signature with the vulnerability signature database to detect the vulnerability code clone in the component to determine the vulnerability function. As is shown in Equation (5), when all codes of the target function are included in the vulnerability signature, we calculate the similarity of syntax between the target function and the function in the vulnerability signature through the Jaccard similarity coefficient. If it reaches the threshold (0.5), the target function code is the vulnerability code clone. We have
s i m ( f , f v ) = | f f v | | f f v | .

3.2.2. Binary Vulnerability Function Detection

According to source vulnerability functions, we search the binary vulnerability functions in the binary module reusing open-source components by the binaryAI [36] engine. Under normal circumstances, the source code functions need to be compared with all binary functions of the program. Based on software component analysis based on binary program modularization, we only need to match the vulnerability function in the specific binary module, which greatly reduces the analysis range of the binary vulnerability function detection task.
The implementation principle of the binaryAI engine is mainly to embed the immediate number, string, symbol, pseudocode, and control flow graph, and obtain the matching function through similarity search. Because it is to detect the vulnerability functions in the binary module reusing open-source components which contain vulnerability functions, we don’t use the public function set provided by the binaryAI engine for matching. Before the function search, based on the engine, we create the vulnerability function set of open-source components reused by binary programs, including the function code, source file path, function features obtained by the feature extraction library, and other relevant information. We can obtain binary vulnerability functions matching source vulnerability function set through vector similarity comparison.

4. Results and Analysis

In this section, we evaluate the effectiveness of BMVul in open-source component detection. in open-source component detection. In addition, we carry out vulnerability function detection for the binary module that reuses open-source components.

4.1. Datasets

We collect two datasets (dataset I and dataset II) for the evaluation of BMVul.
Dataset I. The ground-truth binary programs obtained according to source file analysis and the partial dataset from ISRD [16] and ModX [17] are shown in Table 3. We obtain the source code of components from GitHub.
Table 3. Dataset I used for evaluation.
Dataset II. We collect the top 10 frequently reused components involved in B2SFinder from Github to obtain the source code of each component in the past three years. The description of the top 10 components is shown in Table 4 (the order in the table does not represent the ranking of reuse frequency).
Table 4. Dataset II used for evaluation.

4.2. Compared Approachs

We compare BMVul with B2SFinder [2] and Louvain detection [18] (component detection based on the Louvain algorithm). Our prototype BMVul carries out program modularization by DBM, which is a direct and overlapping binary function clustering technology integrating modularity and information theory algorithms. Then, we perform feature-based software component analysis for binary modules. According to the modular software component analysis results, we detect specific vulnerability functions in the binary modules reusing components. Louvain detection is a Louvain-based component-identification method. It differs from BMVul in the modularization phase. Louvain detection only uses modularity algorithm to divide binary modules without considering the directivity of function call graph and overlapping clustering. B2SFinder is a file-level feature-based binary and source code comparison method. It does not perform binary program modularization and only compares the entire binary program with open-source components.

4.3. Evaluation Metrics

In experiments, we select precision and F1 score as evaluation metrics. The definitions are shown as Equations (6) and (7):
P = T P T P + F P
F 1 = 2 P R P + R .

4.4. Effectiveness of Component Detection

We evaluate the effectiveness on Dataset I to compare BMVul with B2SFinder and Louvain detection. Table 5 reports the performance of BMVul in terms of efficacy in detecting open-source components.
Table 5. Comparison of BMVul with other methods in terms of effectiveness.
P represents the precision of identifying components in binary modules, and P 1 m represents the precision of identifying components in a unique binary module. As can be seen from the table, the precision of B2SFinder is only 47%, and Louvain detection is 72.7%. However, BMVul reaches 75%, which is significantly improved by 3.16% and 59.57% than the above two methods, respectively. There are more false positives in B2SFinder, which causes a high false positive rate. The detection based on software modularization can decrease the number of false positive cases to reduce the false positive rate. It can be found that BMVul increases by about 39.43% compared with Louvain detection by analyzing the changes of P 1 m values, because BMVul divides software modules based on DBM, considering the directed property of function call graph, which is different from the Louvain algorithm. The Louvain algorithm can only divide a function into a module without considering directed property and the division of overlapping modules, thus reducing the module-based component detection precision. Therefore, given the situation that some functions may belong to multiple modules at the same time, we apply overlapping detection for function clustering to improve the P 1 m and P. The F1 score of BMVul achieves 56.5%, which outperforms B2SFinder by 8.45%, so it is better than the current file-level component detection.

4.5. Evaluation of Matching Unique Module

We express the precision that a reused component matches a unique module as T P 1 m . T P 1 m of BMVul and Louvain detection for each binary program is shown in Figure 3.
Figure 3. The comparison of T P 1 m .
As can be seen from the figure, T P 1 m of BMVul are above or equal to Louvan detection. Among them, there are five T P 1 m exceed Louvain detection. The ratio between T P 1 m and T P results is expressed as 1 M R a t i o . Which is defined as Equation (8):
1 M R a t i o = T P 1 m T P .
The results of 1 M R a t i o comparison are shown in Table 6. As can be seen from the table, 1 M R a t i o of BMVul reaches 87.5%, which is improved significantly by 31.18% over Louvain detection. Therefore, BMVul performs clustering well through overlapping detection of directed call graphs for functions. Functions that realize the same functionality cooperatively are clustered into the same module, which improves the effect of accurately matching between binary modules and reused components.
Table 6. Evaluation of 1 M R a t i o .

4.6. Open-Source Vulnerability Function Detection and Analysis of Binary Modules

Open-source components reuse may introduce security vulnerabilities into the binary program. The vulnerability function detection results for the top 10 frequently reused components in dataset II are shown in Figure 4.
Figure 4. Frequently reused components vulnerabilities.
As seen from the figure, we find that only Zlib, Libpng, and unrar don’t contain vulnerabilities in the last three years (Libtiff: 2015–2017). The number of vulnerability functions is relatively high in FreeType and SQLite, reaching 19 and 11, respectively. In software development, once these component codes containing a large number of vulnerability functions are reused, the program will face potential security risks.
Therefore, it is necessary to detect vulnerabilities caused by the reuse of vulnerable components. We detect vulnerability functions of binary programs reusing vulnerable components on Dataset I. The vulnerability function results are shown in Table 7. The fourth and fifth columns are the source code and binary vulnerability functions detected by BVMul. A “✓” means the vulnerability function is reused in the binary.
Table 7. Open-source vulnerability function detection results.
By analyzing the results in the table, based on the software component analysis, we search vulnerability functions in the binary modules that reuse components containing vulnerabilities. We find specific reused vulnerability functions in OpenVPN, Lzbench, and Redis-server, which are no longer limited to detecting potential vulnerabilities. In addition, compared with the detection of the whole binary program file, vulnerability function detection in the module reduces the amount of function matching and avoids comparison analysis of binary function unused components.

5. Conclusions

In this paper, we propose BMVul, a binary modularization-based open-source vulnerability function detection method. BMVul performs binary module-level open-source component identification and then detects vulnerability functions in the located binary module. The experiment results show that BMVul outperforms the state-of-the-art methods B2SFinder and Louvain detection in effectiveness and performs well in binary vulnerability function detection. The precision of BMVul outperforms B2SFinder by 59.57%. Moreover, compared with Louvain detection, the precision of matching a unique binary module increases by 39.43%. At present, most detection methods are limited to the file granularity, which is time-consuming for vulnerability function searching. However, we can find reused components at module granularity. With the achieved accuracy, we can find correct binary module reused components. In vulnerability function analysis stage, instead of doing a lot of global analysis, we only need to analyze specific binary modules. Additionally, binary modularization-based detection greatly reduces the search scope of open-source vulnerability functions in binary programs, which is of great significance to software security analysis. However, there may be other properties that can be created to make the modularization results better. Moreover, we can try to find other features that are more suitable for binary-source comparison to get more accurate results. Although the current technology has not reached the ideal result, it will certainly be improved in the future with more in-depth research.

Author Contributions

Conceptualization, X.G.; data curation, R.C. and W.S.; methodology, X.G.; software, X.G. and S.L.; formal analysis, X.Y.; writing—original draft preparation, X.G.; writing—review and editing, X.G. and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Foundation Strengthening Key Project of the Science & Technology Commission (2019-JCJQ-ZD-113).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The calculated data presented in this work are available from the corresponding authors upon reasonable request.

Acknowledgments

The author would like to thank the anonymous reviewers for their valuable comments on our paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ruian, D.; Ashish, B.; Meng, X.; Taesoo, K.; Wenke, L. Identifying open-source license violation and 1-day 421 security risk at large scale. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS’17, Dallas, TX, USA, 30 October–3 November 2017. [Google Scholar]
  2. Yuan, Z.; Xu, J.; Piao, A.; Xue, J.; Huo, W.; Feng, M.; Li, F.; Ban, G.; Xiao, Y.; Wang, S.; et al. B2SFinder: Detecting open-source software reuse in COTS software. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering, San Diego, CA, USA, 11–15 November 2019. [Google Scholar]
  3. Ban, G.; Xu, L.; Xiao, Y.; Li, X.; Yuan, Z.; Huo, W. B2SMatcher: Fine-Grained version identification of open-Source software in binary files. Cybersecurity 2021, 4, 21. [Google Scholar] [CrossRef]
  4. Hemel, A.; Kalleberg, K.T.; Vermaas, R.; Dolstra, E. BAT Finding software license violations through binary code clone detection. In Proceedings of the 33rd International Conference on Software Engineering, Honolulu, HI, USA, 21–22 May 2011. [Google Scholar]
  5. Heartbleed. Available online: https://en.wikipedia.org/wiki/Heartbleed (accessed on 24 November 2022).
  6. OpenSSL. Version 1.0.1, OpenSSL Technical Committee, Canada. Available online: https://www.openssl.org/ (accessed on 24 November 2022).
  7. Libreoffice. Version 4.2.0, The Document Foundation, Germany. Available online: https://www.libreoffice.org/ (accessed on 24 November 2022).
  8. VMware. Version 10.0, VMware, Palo Alto, America. Available online: https://www.vmware.com/ (accessed on 24 November 2022).
  9. Zhao, Q.; Huang, C.; Dai, L. VULDEFF: Vulnerability detection method based on function fingerprints and code differences. Knowl.-Based Syst. 2021, 260, 1101391. [Google Scholar] [CrossRef]
  10. Bowman, B.; Huang, H.H. VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets. In Proceedings of the 2020 IEEE European Symposium on Security and Privacy, Genoa, Italy, 7–11 September 2020. [Google Scholar]
  11. Ding, Y.; Suneja, S.; Zheng, Y.; Laredo, J.; Morari, A.; Kaiser, G.; Ray, B. VELVET: A noVel Ensemble Learning approach to automatically locate VulnErable sTatements. In Proceedings of the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, Honolulu, HI, USA, 15–18 March 2022. [Google Scholar]
  12. Cao, S.; Sun, X.; Bo, L.; Wei, Y.; Li, B. Bgnn4vd: Constructing bidirectional graph neural-network for vulnerability detection. Information and Software Technology. Knowl.-Based Syst. 2021, 136, 106576. [Google Scholar]
  13. Zhang, H.; Qian, Z. Precise and accurate patch presence test for binaries. In Proceedings of the 27th USENIX Security Symposium, Baltimore, MD, USA, 15–17 August 2018. [Google Scholar]
  14. Duan, R.; Bijlani, A.; Ji, Y.; Alrawi, O.; Xiong, Y.; Ike, M.; Saltaformaggio, B.; Lee, W. Automating Patching of Vulnerable Open-Source Software Versions in Application Binaries. In Proceedings of the 28th USENIX Security Symposium, San Diego, CA, USA, 24–27 February 2019. [Google Scholar]
  15. Lancichinetti, A.; Radicchi, F.; Ramasco, J.J.; Fortunato, S. Finding statistically significant communities in networks. PLoS ONE 2011, 6, e18961. [Google Scholar] [CrossRef] [PubMed]
  16. Xu, X.; Zheng, Q.; Yan, Z.; Fan, M.; Jia, A.; Liu, T. Interpretation-enabled software reuse detection based on a multi-level birthmark model. In Proceedings of the 2021 43rd IEEE/ACM International Conference on Software Engineering, ICSE’21, Madrid, Spain, 22–30 May 2021. [Google Scholar]
  17. Yang, C.; Xu, Z.; Chen, H.; Liu, Y.; Gong, X.; Liu, B. ModX: Binary Level Partially Imported Third-Party Library Detection via Program Modularization and Semantic Matching. In Proceedings of the 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), Pittsburgh, PA, USA, 25–27 May 2022. [Google Scholar]
  18. Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 10, P10008. [Google Scholar] [CrossRef]
  19. Mohammadi, S.; Izadkhah, H. A new algorithm for software clustering considering the knowledge of dependency between artifacts in the source code. Inf. Softw. Technol. 2019, 105, 252–256. [Google Scholar] [CrossRef]
  20. Zamli, K.Z.; Din, F.; Ramli, N.; Ahmed, B.S. Software Module Clustering Based on the Fuzzy Adaptive Teaching Learning Based Optimization Algorithm. In Proceedings of the Intelligent and Interactive Computing, IIC’18, Turin, Italy, 10–14 September 2018. [Google Scholar]
  21. Sun, J.; Ling, B. Software Module Clustering Algorithm Using Probability Selection. Wuhan Univ. J. Nat. Sci. 2018, 23, 93–102. [Google Scholar] [CrossRef]
  22. Hatami, E.; Arasteh, B. An efficient and stable method to cluster software modules using ant colony optimization algorithm. Supercomputing 2019, 76, 6786–6808. [Google Scholar] [CrossRef]
  23. Varghese, R.B.G.; Raimond, K.; Lovesum, J. A novel approach for automatic remodularization of software systems using extended ant colony optimization algorithm. Inf. Softw. Technol. 2019, 114, 107–120. [Google Scholar] [CrossRef]
  24. Psarras, C.; Diamantopoulos, T.; Symeonidis, A. A Mechanism for Automatically Summarizing Software Functionality from Source Code. In Proceedings of the IEEE 19th International Conference on Software Quality, Reliability and Security, QRS’19, Sofia, Bulgaria, 22–26 July 2019. [Google Scholar]
  25. Saied, M.A.; Ouni, A.; Sahraoui, H.; Kula, R.G.; Inoue, K.; Lo, D. Improving reusability of software libraries through usage pattern mining. J. Syst. Softw. 2018, 145, 164–179. [Google Scholar] [CrossRef]
  26. Bhoraskar, R.; Han, S.; Jeon, J.; Azim, T.; Chen, S.; Jung, J.; Nath, S.; Wang, R.; Wetherall, D. Brahmastra: Driving Apps to Test the Security of Third-Party Components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, 20–22 August 2014. [Google Scholar]
  27. Backes, M.; Bugiel, S.; Derr, E. Reliable third-party library detection in android and its security applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS’16, Vienna, Austria, 24–28 October 2016. [Google Scholar]
  28. Alrabaee, S.; Shirani, P.; Wang, L. FOSSIL: A resilient and efficient system for identifying FOSS functions in Malware binaries. ACM Trans. Priv. Secur. 2018, 21, 1–34. [Google Scholar] [CrossRef]
  29. Karande, V.; Caballero, J.; Chandra, S.; Khan, L.; Lin, Z.; Hamlen, K. BCD: Decomposing binary code into components using graph-based clustering. In Proceedings of the 2018 ACM Asia Conference on Computer and Communications Security, ASIA CCS’18, Cheon, Republic of Korea, 4–8 June 2018. [Google Scholar]
  30. Newman, M.E.J. Fast algorithm for detecting community structure in networks. Phys. Rev. E 2004, 69, 066133. [Google Scholar] [CrossRef] [PubMed]
  31. Lancichinetti, A.; Radicchi, F.; Ramasco, J.J. Statistical significance of communities in networks. Phys. Rev. E 2010, 81, 046110. [Google Scholar] [CrossRef] [PubMed]
  32. Newman, M.E.J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [PubMed]
  33. Rosvall, M.; Bergstrom, C.T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA 2008, 105, 1118–1123. [Google Scholar] [CrossRef] [PubMed]
  34. Gregory, S. Finding overlapping communities in networks by label propagation. New J. Phys. 2010, 12, 103018. [Google Scholar] [CrossRef]
  35. Woo, S.; Hong, H.; Choi, E.; Lee, H.; Symposium, U.S. Movery: A Precise Approach for Modified Vulnerable Code Clone Discovery from Modified Open-Source Software Components. In Proceedings of the 31st USENIX Security Symposium, Boston, MA, USA, 10–12 August 2022. [Google Scholar]
  36. BinaryAI: The Neural Search Engine for Binaries. Available online: https://binaryai.readthedocs.io/en/latest/ (accessed on 18 November 2022).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.