1. Introduction
An algorithmic complexity vulnerability is an inappropriate design or implementation of algorithmic logic that can lead to attackers carefully constructing malicious inputs, triggering abnormal resource consumption or time consumption, and resulting in a denial of service [
1]. The most typical such kind of denial of service (DoS) attack is Zip bomb [
2], such as 42.zip [
3], which may appear to be only 42 KB, but after decompression, it can scale to several PB of data, causing enormous pressure on the decompression program and resources, resulting in spatial ACV. The Regular Expression Denial of Service (ReDoS) vulnerability [
4,
5,
6,
7,
8,
9], is another type of vulnerability caused by improper filtering rules on regular expressions, allowing attackers to consume a significant amount of time and resources for software or library regular matching operations by submitting specific inputs. The recent occurrence in Ruby, such as CVE-2023-36617 [
10], is a typical example.
As above, compared to traditional memory crash vulnerabilities that can be tested through blind fuzzing methods, algorithm complexity vulnerabilities are more closely integrated with algorithm design and implementation, which leads to traditional blind fuzzing methods being more inefficient. At the same time, due to the extensive use of open-source components including logging framework and blockchain smart contrast provided by third-party repositories in modern software development, such as Maven and GitHub, software developers unintentionally introduce code with algorithmic complexity vulnerabilities developed by others, further expanding the threats of ACV. This also makes it difficult for existing analysis methods to analyze algorithmic complexity vulnerabilities hidden in open-source components.
At present, researchers have proposed solutions to the above problems. Liu et al., proposed Acquirer [
11], which combines static and dynamic methods to generate control flow and data flow graphs of the program, identify potential algorithm complexity issues, and use dynamic testing techniques to verify the resource consumption of the program, thereby discovering and verifying temporal ACVs. Awadhutkar et al. proposed DISCOVER [
12], a static analysis tool that calculates the time complexity of a loop by analyzing the number of iterations in the loop, internal operations in the loop body, and nested structures. Blair et al., proposed HotFuzz [
13], which compensates for the shortcomings of traditional fuzzing methods in discovering ACVs by introducing temporal and spatial guidance, in order to discover more spatial and temporal features associated with different inputs. Through directional fuzzing testing, more ACVs are discovered. Li et al., proposed a dynamic path search approach that combines dynamic symbol execution and constraint-based path exploration techniques to explore the execution path of a program, and detects ACVs based on path coverage information and pathological inputs [
14].
In order to address the challenges faced by detecting ACV and address the gap in related field mentioned above, we propose a three-step ACV detection and verification workflow. First, we propose a static analysis method based on analyzing Java bytecode, analyzing various complex iterations and other conditional operations, APIs such as I/O, exception handling mechanisms, etc., in order to provide detailed information for analysts to perform high-level abstraction and filter suspicious vulnerability code call chains. Secondly, to compensate for the shortcomings of existing ACV detecting models, a brand new Java ACV model related to exception handling mechanism in deep recursion process is proposed. The existing methods have strong randomness and require a large amount of manual verification, ignore the underlying causes of vulnerabilities, such as Java language features, which reduces the efficiency of analysis. Finally, we propose a ACV verification and payload generation method. Most existing symbolic execution techniques are used in the stage of initial filtering suspicious code with ACVs. However, applying symbolic execution to the payload generation stage also has certain limitations, such as lack the ability to support complex data types. A call-chain-based and manually assisted payload generation method is proposed, which can more efficiently accelerate the vulnerability verification and payload generation.
To demonstrate the ability of our proposed method to detect ACVs in open-source repositories, we conducted an analysis on Maven, the most popular third-party component repository in Java. During the experiment, we identified a large number of vulnerability call chains with algorithmic complexity risks and conducted manual verification. We constructed corresponding payloads and provided vulnerability reports to the project maintainers. Among them, eight were officially recognized and assigned CVE numbers. In addition, other vulnerability call chains have been manually confirmed to affect the reliable operation of the algorithm, but have not yet been submitted to the maintainers. Compared to the State-of-the-Art (SOTA), it has also achieved significant improvements.
In summary, the main contributions of our work are listed as follows:
A static analysis method for filtering program call chain. We provide optional static analysis methods, including sensitive call chain filtering based on the Root framework and graph based analysis, supporting abstract understanding of program algorithm and logic, and supporting the construction and application of vulnerability models.
A new model for detecting algorithm complexity vulnerabilities. An ACV model related to Java deep recursion and exception handling mechanisms has been proposed, and a call chain filtering and analysis method has also been proposed. This model have been overlooked in existing research, but still remain an important cause of spatial algorithm complexity vulnerabilities.
Payload generation guided by vulnerable call chains. Applying symbolic execution and constraint solving to payload generation instead of path solving still has certain shortcomings. A vulnerable call chains guided payload generation method is proposed to achieve faster vulnerability verification, and improve the efficiency of vulnerability exploitation.
New algorithm complexity vulnerabilities. Based on the above work, detailed testing was conducted on third-party components in the Maven Repository, discovering many new 0-day vulnerabilities and obtaining eight CVE numbers.
The remaining parts of our paper are organized as follows:
Section 2 summarizes the work in related fields,
Section 3 introduces the background and threat model of this research,
Section 4 introduces the main innovations of this paper,
Section 5 describes and introduces the experimental results, and
Section 6 summarizes and prospects for the future.
2. Related Work
Awadhutkar et al. [
12,
15] noted that fully automated detection of algorithmic complexity vulnerabilities is not feasible. Therefore, they proposed a tool named DISCOVER to assist in detecting these vulnerabilities. Their workflow mainly consists of three stages: First, they automate loop feature description based on the Termination Dependence Graph (TDG) and Loop Projected Control Graph (LPCG). Subsequently, they filter for suspicious loop patterns and further provide a catalog of loops to users for interactive auditing.
Blair et al. [
13] introduced HotFuzz, a Guided Micro-Fuzzing method designed to discover Algorithm Denial-of-Service vulnerabilities. They utilized genetic algorithms to evolve arbitrary Java objects, aiming to trigger the worst-case performance of target methods. They defined Small Recursive Instantiation (SRI) to derive seed inputs represented as Java objects for micro-fuzzing. Additionally, they employed EyeVM to track and analyze the root causes of these vulnerabilities.
Wei et al. [
16] introduced a pattern fuzzing method named Singularity, aimed at uncovering the worst-case scenarios in given applications. The authors initially proposed a Domain Specific Language-style approach, namely the Recurrent Computation Graph (RCG) computational model, to express input patterns. Furthermore, they utilized a genetic programming (GP) algorithm to manipulate RCGs for solving optimization problems, thus generating more effective input patterns that trigger the worst performance. They discovered availability vulnerabilities in real-world applications such as Google Guava and JGraphT.
Liu et al. [
11] proposed a hybrid method for detecting algorithmic complexity vulnerabilities. This approach initially employs static methods to analyze loops and recursive structures in the target Java source code, identifying potentially vulnerable loop constructs. It then utilizes context information to guide critical path instrumentation, and constructs test cases based on branch strategies. By employing dynamic selective symbolic execution for path searching, it discovers cases that exhibit abnormal resource consumption, thereby uncovering time complexity-related algorithmic complexity vulnerabilities.
Noller et al. [
17] proposed a hybrid method that combines fuzz testing and symbolic execution to detect time and space complexity algorithmic vulnerabilities. The main contribution of this method lies in its initial development of a fuzz testing approach, aimed at enhancing coverage and exploring paths with high resource consumption. To improve the efficiency of fuzz testing, a symbolic execution technique is employed. On the one hand, this technique explores potential high resource consumption paths through symbolic execution. On the other hand, it guides fuzz testing to generate samples that satisfy branch conditions, thereby enhancing the effectiveness of the approach.
Liu et al. [
18] proposed a method for detecting and exploiting vulnerabilities related to ReDoS. They developed a tool named Revealer, which also combines static and dynamic analyses. Initially, they use static analysis tools to locate potentially vulnerable structures in regular expressions. Subsequently, they dynamically validate whether these vulnerable structures can be triggered. The dynamic method employed by the authors differs from fuzz testing and is a relatively precise method of generating regular expressions. They achieved performance improvements and detected a large number of new vulnerabilities.
To sum up, current work in analyzing algorithmic complexity vulnerabilities mainly includes two major approaches, i.e., static method and dynamic method. Static methods are exemplified by loop characteristic analysis, which involves modeling and analyzing loop features in algorithms to identify vulnerable loop patterns [
12,
15,
19,
20]. Dynamic methods are represented by fuzzing techniques, which utilize input mutations and resource consumption tracking to identify test cases that trigger abnormal resource usage [
13,
16,
21,
22,
23]. Hybrid methods, combining static and dynamic analysis, has been proven to enhance the efficiency of algorithmic complexity vulnerability detection and achieve better results [
11,
17,
18,
24].
However, most of these efforts face the following issues: insufficient abstraction of the causes of algorithmic complexity vulnerabilities by dynamic methods, resulting in relatively poor efficiency and capability in vulnerability detecting; inadequate support for vulnerability detecting by static methods, requiring more subsequent manual analysis for support; challenges in comprehensive vulnerability detection due to immature symbolic execution tools by hybrid methods. In comparison to these efforts, the advantages of our work lie in a generic workflow for detecting algorithmic complexity vulnerabilities, as well as a specific, efficient spatial algorithmic complexity vulnerability detection model related to recursive structures and exception handling. Based on this model, we can achieve rapid vulnerability detection and validation.
4. Methodology
The above background leads to our method design as follows:
The analysis method for the third-party components of the supply chain. Statically analyze the supply chain components, and then pave the way for the subsequent vulnerability analysis. The analysis should include various features of the program, including language independent features (intermediate representation, IR) and language related features (such as exception handling mechanisms of Java languages).
The filter method for vulnerable recursion structure. Based on the previous step, the function call chain is filtered with the function entry accessible to user data and ACV sensitive language processing mechanism as key retrieval principles. The obtained call chain reveals the algorithm call logic that users may trigger ACV by constructing malicious inputs.
The exploitation method for ACV. Based on the control of input sanitization for user input type and field, and the control of hard resource limits for input size, we verify the vulnerability and generate the corresponding payload guided by the comprehension of algorithm logic and the vulnerable call chain.
4.1. Overall Architecture
Our overall architecture as shown in
Figure 2 includes three main parts: processing, analysis, and verification. The processing part handles the third-party components of Java (e.g., Maven Java Archive (JAR) files) and constructs the call chain. The analysis part is based on our vulnerability model to detect the vulnerability code, including filtering the recursion structure, analyzing the termination conditions and exception handling mechanism. In the verification phase, the vulnerability is verified and exploited by generating payload based on vulnerable call chain and the comprehension of language features.
4.2. Analysis Base Construction
Algorithm complexity vulnerability is a vulnerability related to algorithm logic design. This vulnerability has nothing to do with specific programming language features, so this type of vulnerability may appear in different languages, such as Java, Python, etc. Here, we mainly discuss Java language, for other program languages, simply replacing the JAR file with the corresponding source code or other code files, following the same follow-up process and method, ACV detection can also be achieved. Language-independent intermediate representation is an abstract and programming language-independent representation of programming code. By converting the source code into a language-independent intermediate representation, we can eliminate the characteristics strongly related to the language, focus on the language independent algorithm logic and call structure, and help analysts analyze vulnerabilities more efficiently.
We propose two ways to support the analysis of ACV:
Code analysis based on the Soot framework [
34,
35].
Using the Soot framework as a foundation, we proposed a new intermediate representation analysis framework to support the visualization of function call structures in graph format. Our framework supports the analysis of bytecode in JAR files, generating language-independent intermediate representations, and constructs dependency and call relationship lists. These relations are then stored and represented using a graph database. By leveraging the code graph, we can easily identify structural features associated with ACVs, such as direct recursion and indirect recursion as shown in
Figure 3, aiding in the construction and summarization of vulnerability models, further supporting the transfer of models to different programming languages [
36].
4.3. Model-Based Vulnerable Call Chain Search
Here, we mainly introduce our new ACV detecting model. On the basis of fully understanding the principle of vulnerability and the features of the programming language, we propose a new spatial algorithm complexity vulnerability detecting model. Our main design basis includes three parts: (1) aiming at the language-independent ACV features, an analysis method for class methods is proposed; (2) according to the characteristics of recursive structure, the analysis and judgment method of unsafe termination condition is proposed; and (3) for the Java language features, we discover the vulnerable recursion fragment that are not captured and processed by Java language-dependent exception and error handling mechanism.
4.3.1. Class Method Recursive Structure
In order to improve the simplicity, efficiency and maintainability of the code, Java introduces class methods. Furthermore, class methods have two designs: recursive and non recursive. Recursive design can handle a large number of algorithmic logic problems with simple code by calling itself (as shown in Listing 2). However, because each recursive call needs to save the context information, when the recursive depth is too large, it may lead to frequent method calls and stack operations, resulting in stack overflow ACVs.
Listing 2. Java class method recursion structure. |
![Applsci 14 01855 i002]() |
Most of the existing ACV detecting and even most vulnerability detecting methods only support basic data types. Taking the vulnerability detecting method based on symbolic execution technology as an example, most of the existing symbolic execution vulnerability detecting methods for Java language only support basic data types, such as int and char, while it is difficult to solve constraints for complex data types such as String or Object, thus is not possible to better detect vulnerabilities related to class methods. Therefore, at this stage, we mainly adopt static call chain feature analysis instead of symbolic execution, supplementing the analysis of class methods recursion and then supporting the subsequent vulnerability verification and utilization.
Our algorithm, as outlined in the pseudocode in Algorithm 1, is implemented using the Soot framework. We start by acquiring all ActiveBodies from the JAR file. For each unit within an ActiveBody, we analyze its type of call (Line 5). If these calls are of the
InvokeStmt type or the
AssignStmt type, and contain a method call (Line 6), we then conduct further analysis: we examine the method call, and if it is not within an exception handling block (Line 9, will be detailed described in
Section 4.3.3), we proceed to check the stack used for storing method calls. If recursive calls are detected (Line 12), and there are no limits on recursion depth (Line 13, will be detailed described in
Section 4.3.2), we record the current call stack. If no recursive calls are found, we push non-basic class methods and non-abstract methods onto the stack and recursively analyze the called method (Line 19).
Algorithm 1 Discovering all vulnerable recursive structures. |
- Input:
JAR file of the project to be tested - Output:
Vulnerable recursive structures - 1:
use Soot framework to analyze Jar File to get object - 2:
initialize to store - 3:
← retrieve all active body of - 4:
if unable to retrieve, exit - 5:
for each in the method’s do - 6:
if is a instance of ( or ( containing )) then - 7:
← Extract the called method from the - 8:
end if - 9:
if the is within an exception handling block then - 10:
skip to the next - 11:
end if - 12:
if contains then - 13:
if the does not have a recursion depth limitation then - 14:
record the current for analysis - 15:
end if - 16:
else if then - 17:
if the is neither a basic class method nor a abstract method then - 18:
add the called method to the - 19:
recursively analyze the - 20:
remove the method from the stack after analysis - 21:
end if - 22:
end if - 23:
end for
|
At the same time, our method supports the analysis of direct and indirect recursion. On the call chain, direct recursion and indirect recursion have the following characteristics: direct recursion has the characteristic of explicit self-calling in the figure, and indirect recursion may achieve self calling through complex indirect calls.
Figure 2a shows “
methodB()” directly recursively calling itself, while
Figure 2b shows “
methodA()” indirectly recursively calling itself by calling “
methodB()” and “
methodC()”.
4.3.2. Unsafe Recursion Termination Conditions
The termination condition is the key part of the recursive algorithm, which determines when the recursion stops and returns the result. In recursive algorithms, the termination condition is usually a conditional statement. When the condition is met, the recursion will no longer continue to execute, but start to return results or perform other operations. The existence of termination conditions is to prevent recursion from entering an infinite loop and ensure that recursion ends eventually.
In general, termination conditions have two important characteristics:
The termination condition must be attainable. That is to say, in the process of recursion, after a series of recursive calls, the termination condition should be met finally.
The termination condition should not include a recursive call, otherwise the recursion will not end. The termination condition should be a explicit case without further recursion.
To match and detect whether there is a limit on the number of recursive levels, software developers commonly use two methods to record and judge the recursion depth:
Define a class attribute in the class corresponding to the recursive method to record the recursion depth. Each time the recursive method proceeds to the next level, increment this class attribute to update the recursion depth. Before executing the next recursive method, compare this class attribute with a constant representing the maximum depth. The recursive method can proceed to the next level if the depth is less than this constant value.
Include a variable to record the recursion depth as a parameter in the recursive method. Each time the method recurses to the next level, increment this depth variable and pass it to the next level of recursion. Before executing the next recursive method, compare this variable with a constant representing the maximum depth. The recursive method can proceed to the next level if it is less than this constant value.
The common features of these two methods are: (1) There is a variable that records the depth of recursion, which increases in synchronism with each recursion level. (2) This variable is compared with a constant value, and the next level of recursion is only allowed if the variable is less than this constant value. Hence, the logic for judgment is as follows, and the algorithm pseudocode is shown in Algorithm 2.
Identify the variable in the recursive method that has ‘+1’ increment (Line 7).
Determine whether this variable is a formal parameter of the recursive method or comes directly or indirectly from a class attribute (Line 7).
Check if this variable is compared with a constant value, and ensure that the call to the next level of recursion is within the judgment logic that checks for the variable being less than this constant value (Line 5).
4.3.3. Uncaptured Error
Based on the Java exception handling mechanism mentioned above, if a method’s nested call does not capture errors such as “java.lang.throwable”, 7 “java.lang.error”, or “java.lang.stackoverflowerror”, it may indicate the presence of error types that are unforeseen or unhandled by the developer. This is particularly critical in the case of “StackOverflowError”, as such uncaught errors can lead to algorithmic complexity vulnerabilities, potentially causing runtime errors that are not anticipated by the developers.
By utilizing Soot or our proposed method to analyze the call chains of all public methods in Java third-party libraries, we can detect whether there are instances of recursion without catching the aforementioned types of errors. With the help of our proposed method, we can clearly identify the presence of recursive paths in the program (i.e., cases where class methods reference themselves). If these recursive methods do not show try-catch captures for errors like “
java.lang.throwable”, “
java.lang.error”, and “
java.lang.stackoverflowerror” at the statement level within the call chain context, it indicates that errors in recursion are not foreseen and handled by the developer but are thrown by the JVM, which are worth our analysis and utilization.
Algorithm 2 Judging if there is constraint for recursion depth in a method body. |
- Input:
of the method to be analyzed - Output:
boolean indicating if there is a recursion depth limit - 1:
- 2:
for each in the method’s do - 3:
if is a type of conditional statement then - 4:
cast to conditional statement - 5:
if is recursion depth check statement then - 6:
extract variable from - 7:
if has increment and is (class attribute or parameter reference) then - 8:
- 9:
break - 10:
end if - 11:
end if - 12:
end if - 13:
end for - 14:
return
|
As illustrated in Algorithm 3, Line 1 iterates over each exception handling block (trap) in the body and use a flag “
isInTrap” to track whether the current iteration is inside a trap (Line 4). Then, iterate over each unit (Line 5) in the body for checking whether the unit is inside the exception and error handling block (Line 6–14). This function is further used in Algorithm 1 mentioned in
Section 4.3.1.
Algorithm 3 Determine if a recursion structure is captured by error handling mechanisms. |
- Input:
of the method and a object to be analyzed - Output:
boolean indicating if the is within any exception handling block (trap) - 1:
for each in ’s traps do - 2:
get the begin unit of the - 3:
get the end unit of the - 4:
- 5:
for each in ’s units do - 6:
if then - 7:
- 8:
end if - 9:
if and the contains a call to then - 10:
return true - 11:
end if - 12:
if then - 13:
- 14:
end if - 15:
end for - 16:
end for - 17:
return false
|
4.4. Recursion-Guided Payload Generation
Most current automated payload generation methods rely on symbolic execution and constraint solving [
37,
38]. However, these techniques are more mature in the context of C language, while symbolic execution technology for Java is still in a developmental and exploratory stage. There is an urgent need for reliable and efficient test case generation techniques and tools specifically for Java programs. At present, most Java symbolic execution tools [
39] can only support basic and simple data types, such as symbolic representation of
int and
char. However, they are still unable to fully symbolize complex data types like
string and
object. This limitation restricts the use of symbolic execution for constraint solving on specified vulnerability paths.
To address our targeted vulnerability types, i.e., spatial ACV, we propose a semi-automatic, procedural, and standardized payload generation method to support rapid vulnerability verification and exploitation. Our method process mainly includes:
Analyzing the recursive method’s parameter processing logic. First, identify and extract all statements involved in processing the parameters within the recursive method according to the automatically filtered call chain. This includes all operations from the start of the recursive method until the parameters are passed to the next level of recursive calls.
Constructing the initial actual parameter (argument). Based on the extracted parameter processing logic, generate an initial actual parameter (the initial value of the parameter) that should satisfy the conditions for at least one recursive call.
Automatically iterative processing of actual parameters. Apply the extracted parameter processing logic to the initial actual parameter, repeating multiple times (e.g., 10,000 iterations). Each iteration simulates the changes in the arguments during a recursive call.
Generating the final payload. After numerous iterations, the final value of the obtained actual parameter is used as the payload. This payload represents the final state of the argument after multiple recursive analysis and can be used for further verification or exploitation.
6. Conclusions
Algorithmic complexity vulnerabilities are prevalent in various applications and programming languages, and have a significant impact on a multitude of downstream software, especially with the widespread use of open-source code repositories in today’s era. Recent works in the field have mostly focused on algorithmic complexity vulnerability detection based on fuzzing testing, which involves generating initial and mutated samples either randomly or based on limited understanding of the program. This approach monitors the program’s execution time or resource consumption to detect vulnerabilities. Some studies also provide interactive views to support auxiliary analysis. Compared to existing methods, our tool offers the following advantages: precise vulnerability model that accurately identify spatial algorithmic complexity vulnerabilities related to recursive structures; higher efficiency and accuracy with almost no false positives in specific vulnerability type; and support for rapid vulnerability verification and exploitation, with an analysis process that more closely aligns with the nature of vulnerability occurrence.
Of course, our research method has certain limitations and future development opportunities. First, we have not fully explored the potential of our vulnerability call chain discovery method. We have only proposed an accurate model for detecting algorithmic complexity vulnerabilities in recursive structures. In the future, we could develop more models to discover more exploitation patterns of algorithmic complexity vulnerabilities, even other vulnerabilities. Secondly, regarding the generation of vulnerability payloads, we currently employ a semi-automated method limited by the lack of Java dynamic symbolic execution tools, requiring significant manual intervention. In the future, we could consider delving into fully automated payload generation to improve the efficiency of vulnerability verification and exploitation. Finally, we will continue to delve deeper into our research on the same type of vulnerability detecting methods between different programming languages, utilizing a unified intermediate representation to detect ACVs and other type of vulnerabilities related to program logic.