WTA: A Static Taint Analysis Framework for PHP Webshell

: Webshells are a malicious scripts that can remotely control a webserver to execute arbitrary commands, steal sensitive ﬁles, and further invade the internal network. Existing webshell detection methods, such as using pattern matching for webshell detection, can be easily bypassed by attackers using the ﬁle include and user-deﬁned functions. Furthermore, detecting unknown webshells has always been a problem in the ﬁeld of webshell detection. In this paper, we propose a static webshell detection method based on taint analysis, which realizes accurate taint analysis based on ZendVM. We ﬁrst converted the PHP code into Opline sequences, analyzed the Opline sequences in order, and marked the externally imported taint source. Then, the propagation of the taint variables was tracked, and the interprocedural analysis of the taint variables was performed. Finally, considering the dangerous functions’ call and the referencing of the taint variables at the point of the taint sink, we completed the webshell judgment. Based on this method, we constructed a taint analysis prototype system named WTA and evaluated it with a benchmark dataset by comparing its performance with popular webshell detection tools. The results showed that our method supports interprocedural analysis and has the ability to detect unknown webshells and that WTA’s performance surpasses well-known webshell detection tools such as D-shield, SHELLPUB, WebshellKiller, CloudWalker, ClamAV, LoKi, and ﬁndbot.pl.


Introduction
With the rapid development of network technology, web applications [1] have become the dominant form by which Internet companies provide users with web services. At the same time, all kinds of network attacks on web applications have become the main problem threatening Internet security. In February 2020, Microsoft released a report, Microsoft Defender Advanced Threat Protection [2], showing that it detects approximately 77,000 active webshells [3] per day, which means that webshells have become some of the most popular types of malware today. Webshells are a malicious network backdoor that can exist in multiple scripting languages [4], allowing attackers to gain system privileges or control the webserver by executing arbitrary commands [5]. Attackers can use webshells to carry out a series of malicious operations, such as accessing server databases and sensitive files, stealing and tampering with user data, modifying the home page of a website, and so on. In terms of website security, it is crucial to detect webshell files and delete them [6].
According to the scripting language, webshells can mainly be divided into three types, namely ASP, PHP, and JSP scripting Trojans [7]. Due to its simple syntax and high development efficiency, PHP has become the first choice for developing various types of web applications [8]. Therefore, this paper mainly studies the PHP webshell detection method.
At present, webshell detection methods can be divided into dynamic feature detection and static feature detection.
The dynamic feature detection method is based on the characteristics of the webshell execution process, such as the behaviors of webshell files, webshell communication traffic [9], and other characteristics [7]. This method only works when the webshell is executing dynamically. On the one hand, this method has a certain ability to detect new variants of scripts and is good at detecting webshell features generated by operations [10]. On the other hand, this method must detect the traffic during the operation and communication process and needs to maintain a large behavioral characteristic library, so it may consume most of the computing resources of the server.
The static feature detection method is mainly based on the text content of webshell and network log information [11,12] for analysis and detection. Regular expressions [13] were the earliest method used for webshell content detection. Its disadvantage is that it can only extract features from the existing known webshells, and it needs to be constantly updated [14]. D-Shield [15] is a currently popular static webshell detection tool. It uses signature database matching to detect webshells and divides webshells into six levels according to the degree of damage: Level 0 is not a webshell, and Level 6 is a known webshell. Therefore, static feature detection methods cannot detect unknown or new webshells. In addition, due to the constant evolution and iteration of code obfuscation and code encryption techniques, webshells can easily bypass regular methods, which are based on regular expressions. Moreover, the static feature detection method has no way to conduct interprocedural analysis, that is to detect the included files and user-defined dangerous function, so the detection method is based on the feature code and syntax analysis, and the dangerous function [16] name matching can be easily bypassed.
In recent decades, the role of taint analysis [17] in program analysis has attracted extensive attention from researchers. Static taint propagation analysis, also called static taint analysis [18], is the analysis of the data dependencies between variables to test whether data can be propagated from the taint source to a point of the taint sink without running or modifying them. The object of static taint analysis is generally the source code or intermediate representation of the program. RIPS [19] uses taint analysis to analyze PHP codes in a static way, which is based on AST derived from syntax analysis. Yu Li et al. [20] proposed a detection platform named Shellbreaker. They extracted eight new source codes and AST syntactic and semantic features. Two of the features are explicit data flow and implicit data flow, and they are extracted by taint analysis. Then, the eight features are fused into a vector. Finally, a statistical classifier is used to analyze the feature vector. However, this method has a limited detection effect on one-sentence webshells [21], because several types of features extracted by this method are aimed at self-adaptive webshells.
The performance of the dynamic feature detection method is poor, and the construction of the environment is complex. Traditional static feature detection methods have difficulties in detecting unknown webshells and lack the capability to perform interprocedural analysis. In addition, there has been some research that has used taint analysis for webshell detection with limited effect. To address the above challenges, this paper proposes a webshell detection method based on static taint analysis.
The main contributions of this paper are as follows: (1) We applied the ZendVM instruction set to the field of taint analysis for the first time and defined the taint propagation rules and taint sink rules of the instruction set; (2) We proposed a novel static detection method based on taint analysis for PHP webshells.
The method can carry out interprocedural analysis and detect more unknown webshells; (3) We implemented a taint analysis prototype system named WTA for webshell detection and evaluated the effectiveness of our method by comparing it with existing tools through a benchmark dataset consisting of ten webshell datasets and six CMSs.
The remainder of this paper is organized as follows. Section 2 describes the background information on PHP. Section 3 introduces the include-type webshell, the userdefined function-type webshell, and the unknown webshell, which bring challenges to webshell detection. The overview of the proposed approach is described in Section 4. Section 5 describes the details of the three key steps in our method. Section 6 evaluates our method. We summarize the related work in Section 7 and provide our conclusions in Section 8.

Background
PHP [22]. PHP is a popular scripting language that is particularly suited to web development. It runs in four modes: PHP-CLI, PHP-CGI, PHP-FPM, and PHP-MOD. PHP has three main characteristics: First, PHP code is open-source, and the community is active, so the number of people using PHP is large. Second, the syntax of PHP is simple; processoriented and object-oriented programming can be mixed; it is easy to use; it has many built-in modules. Third, PHP has strong expansibility. In the process of the continuous development of PHP, it can take into account the performance and the current popular frameworks and has a good extension interface for developers to use.
ZendVM [23]. The virtual machine of a programming language is a program that can run an intermediate language. The intermediate language is an abstract set of instructions compiled from the native language and is the input of the virtual machine during its execution. The virtual machine of the PHP language is called the Zend Virtual Machine (ZendVM). The ZendVM will perform lexical analysis and syntactic analysis on the target PHP file to generate the AST, then compile the AST into Opcodes, and finally, execute the Opcodes and output the results. The workflow diagram of the ZendVM is shown in Figure 1. Opline and Opcode [24]. The ZendVM's instruction is called the Opline, and each instruction corresponds to an Opcode. Oplines are generated after the compilation of the PHP codes. The ZendVM executes PHP codes according to different Oplines. The Opline consists of operation instructions and operands and returns the value, which is similar to a machine instruction. The corresponding structure of the Opline is zend_op. The basic information of the zend_op structure is shown in Listing 1.
PHP extensions [25]. As mentioned above, one of the main reasons for the popularity of PHP is that a large number of extensions are available. Whatever the needs of web developers are, they are likely to find them addressed by the releases of PHP. The releases of PHP include many extensions that support a variety of databases, graphical file formats, compression, and XML technology. Web developers can be involved in the PHP compilation phase and redefine the PHP compilation functions for deeper operations by writing PHP extensions.  Vulcan Logic Dumper (VLD) [26]. VLD is a PHP extension that outputs Oplines by hook. By using the VLD, developers can view the Oplines of the target PHP codes, allowing them to gain a deeper understanding of the PHP codes.

Motivation
The lack of the ability to perform interprocedural analysis and detect unknown webshells is the main challenge in the field of webshell detection. We present two examples to illustrate the significance of interprocedural analysis and the difficulty of unknown webshell detection.
Interprocedural analysis [27]. There are two types of webshells that require interprocedural analysis. The first is include-type webshells, and the second is user-defined function-type webshells. Include-type webshells refer to an attacker who puts the body of the webshell into a text file, image file, or any file in other formats. For example, the attacker puts the body of the webshell into hello.txt, while the webshell file (attack.php) has one "include" statement used to include the file hello.txt. Therefore, if the webshell detection tool only scans the file attack.php, without an in-depth analysis of the contents of the file hello.txt contained in attack.php, the webshell can bypass the detection.
Listing 2 shows two include-type webshell examples called include-webshell-1 and include-webshell-2. Listing 3 shows the file hello.txt included in an include-type webshell. In fact, include-webshell-1 and include-webshell-2 have the same function: they all include the webshell body hello.txt. However, detecting the two files using the popular tool D-Shield [15] obtains different results. The detection results indicate that includewebshell-1 is a webshell of Level 3, and the reason is "suspicious include". D-Shield considers include-webshell-2 not to be a webshell. From this experiment, we can see that the tool D-Shield simply uses include 'filename' as a matching pattern and considers the file to be a webshell once the match is found, while include-webshell-2 bypasses detection by replacing spaces with parentheses. This also indicates that the tool does not detect the contents of the include file and cannot detect the include-type webshells, which will be further explained in Experiment 1 of Section 6. User-defined function-type webshells refer to the way that attackers bypass the scanning of known dangerous functions by creating user-defined functions and executing system commands in the user-defined functions.
Listing 4 shows a user-defined function webshell. Two user-defined functions, dynamic and newassert, are included in the sample to obtain the taint source and to call the dangerous function assert(). The sample is tested by D-Shield [15], and the test results show that the sample is a webshell of Danger Level 1 (a webshell of Danger Level 1 can be considered as a normal file). The reason is that the variable $c is used in this file. Therefore, D-Shield cannot actually detect user-defined function webshells, which will be further explained in Experiment 1 in Section 6. Unknown webshell [28]. An unknown webshell is a webshell that has not yet been discovered. Since such webshells are not captured, current webshell detection tools and antivirus software do not have corresponding sample signatures and cannot detect unknown webshells. Most of the latest methods are based on malicious pattern matching, such as the tool D-Shield, in which keywords are usually defined by domain experts. Therefore, the detection effect depends on the experts, and new webshells are difficult to detect. In addition, there are many research works on webshell detection methods based on machine learning and neural networks, such as cnn-webshell [9] and Yong e al.'s work [10], whose essence is to extract the features of known webshells for analysis. Therefore, it the features of unknown webshells differ greatly from those of known webshells, it will be difficult to detect the unknown webshells, which will be further explained in Experiment 2 in Section 6.

Overview
In order to solve the above limitations of the existing methods, we propose a webshell static detection method based on taint analysis. This method aims to improve the ability of the static detection of unknown webshells and provide the capability to perform interprocedural analysis.
The method proposed in this paper includes the following seven steps, as shown in Figure 2. Initialization (2) completes the preparation work before the taint analysis. The preparation work is to initialize the data structures of Taint import functions list (b), Taint variable list (c), and Dangerous functions list (e). Then, Taint source rules (a), Taint import functions list (b), and Dangerous function list (e) also require the user to set the initial values. For example, the user fills in Dangerous functions list (e) with functions that can be used as the webshell dangerous functions from the PHP functions library [29], such as exec, shell_exec, system, and so on.
After Initialization (2), the taint analysis framework starts to analyze the Opline sequences in order, which are the outputs of (1). Taint source import (3) conducts the detection according to the preset Taint source rules (a) and Taint import functions (b). When the taint source is found to be imported, the variable used to store the taint source will be stored in Taint variable list (c), and a new linked list will be created with this variable as the header node. Then, the linked list will be saved in Taint propagation chain (d).
Taint propagation (4) analyzes the propagation path of the taint variable in the Opline sequences, adds the tainted variable to Taint variable list (c), and adds the new taint variables to the corresponding taint propagation chain according to the propagation path.
The interprocedural analysis of webshells is one of the contributions of this paper, which consists of two modules: Include file recursive detection (5) and User-defined functions detection (6).
Include file recursive detection (5) will start the taint analysis subprocess when it meets the "include" expression, perform recursive detection on the included files, and return the detection results to the main process through the message queue (IPC [30]).
User-defined functions detection (6) actually has the highest priority for execution. Compiler hook (1) first obtains the Opline sequences, which come from the user-defined functions in the target PHP codes, then it obtains the Opline sequences, which come from the user-defined functions of the user-defined class, and finally, it obtains the Opline sequences produced by the other parts. User-defined functions detection (6) performs taint analysis for user-defined functions and regards the parameters of functions as the taint source. After taint propagation, the user-defined functions are defined as dangerous functions/taint import functions and added into Dangerous functions list (e)/Taint import functions list (b) once the parameters that include the taints are imported into the dangerous functions.
Taint sink detect (7) judges whether a dangerous function is called by the function call instruction according to Dangerous functions list (e).
When a dangerous function is called and the parameter of a dangerous function is a taint variable, it will be added into Taint propagation chain (d), which is marked as the webshell taint propagation chain. When the final result is output, the taint propagation chain is detected. If there is a webshell taint propagation chain, it will be presented in Result output (f).
There are several challenges that need to be solved to implement this architecture: (1) The data structures of the taint variables list, taint import functions list, taint propagation chain, dangerous functions list; (2) Taint propagation rules [31] of the PHP Opline; (3) Taint sink rules of the PHP Opline. These difficulties will be addressed in the next section.

Data Structures of Auxiliary Lists
The data structures of auxiliary lists refer to the taint variables list, taint import functions list, taint propagation chain, and dangerous functions list in the initialization module, while the taint source rules are hard coded in the corresponding functions, so they do not need initialization and data structures.
The taint variables list, taint import functions list, and dangerous functions list are built based on the Zend_Hash [32] API of the ZendVM. During the initialization, the taint import functions list and dangerous functions list add the user-configured name array of the taint import functions and dangerous functions to Zend_Hash, in order to improve the retrieval speed of the taint import functions and dangerous functions. The taint variables list in initialization module only finishes the initialization of the memory space, and does not insert any data. Variables in the Opline sequences are displayed in sequential Arabic numerals, and when a variable is marked as a taint variable, the Arabic number representing that variable is inserted into the taint variables list.
The taint propagation chain is constructed by a common doubly linked list. In fact, the taint propagation chain is an array storing the doubly linked list. Whenever a taint source is imported, a new doubly linked list will be created, and the head node of the linked list is the variable just imported by the taint source. When each taint variable (thisVar) is propagated to the next taint variable (nextVar), it will determine whether there is thisVar in the propagation chains according to the propagation relationship and insert nextVar into the next node of thisVar (Situation I). If it is found that thisVar node is not the tail node of the taint propagation chain, but the middle node (which means that the taint propagation chain is divided into two or more paths), then it will copy a new taint propagation list with thisVar as the tail node and insert nextVar into the next node of thisVar (Situation II). This is shown in Figure 3.

Taint Propagation Rules
The ZendVM has a unique instruction set of Oplines. Taint analysis based on Oplines needs a user-defined data flow logic. This section first introduces three definitions: Taint Attribute, Taint Map, and Predefined Taint. Next, it introduces the data flow logic of the ZendVM.
Definition 1 (Taint Attribute). Taint Attribute is an accessoryattribute of a variable in the Opline sequences and is a Boolean value. When a variable's Taint Attribute is True, it is a taint variable, and when its Taint Attribute is False, it is a normal variable.
Definition 2 (Taint Map T(·)). Regard v as a variable. T(v) will return the value of the variable v's Taint Attribute. The semantics of T(v) is related to the position of ←. When T(v 1 ) is on the right of ←, it represents the acquisition of v 1 's Taint Attribute. When T(v 1 ) is on the left of ←, it represents the reception of the Taint Attribute, which represents that v 1 's Taint Attribute is set to a Taint Attribute on the right. For example, T(v 1 )←T(v 2 ) means v 2 's Taint Attribute is passed to v 1 . Taint). Predefined Taint TAINT is a variable that has been pre-identified as a taint due to the characteristics of the PHP language and the ZendVM. Predefined Taints in this method specifically refer to super global variables [33] and the parameters of user-defined functions. Table 1 details the taint propagation logic. We deeply study the ZendVM instruction set, analyze the most probable taint propagation instructions, and finally, obtain this taint propagation logic. This table shows the taint import rules and taint propagation rules when our taint analysis system deals with the ZendVM instructions. The propagation rule of FETCH_R v A , C is TAINT, because the Opline format only appears when super global variables are used. The propagation rule of RECV also has TAINT, because the ZendVM does not recompile PHP library functions, while RECV only occurs in function definition. Therefore, the occurrence of RECV means that the function is a user-defined function. Therefore, using the parameters of the user-defined function as TAINT is helpful for the user-defined function's taint analysis.

Taint Sink Rules
The taint sink needs to meet two conditions: first, the call of dangerous function is detected; second, the called dangerous function uses the taint variable as the parameter. Similarly, it also needs a set of unique taint sink rules to judge the taint sink. Table 2 provides a detailed list of taint sink rules. Oplines related to the taint sink mainly fall into three categories, namely function call initialization (INIT), passing parameters to the function (Param), and function call execution (CALL), which correspond to the three steps of function call execution in the ZendVM.
In the phase of INIT, the corresponding operand of the Opline is detected. When the function called is found in the dangerous functions list, the value of sinkFuncFlag is set to one, indicating that the dangerous function is called.
In the phase of Param, when the corresponding operand of the Opline is found to be a taint variable, it is determined that the taint variable is imported by the function call, and the value of TaintVarFlag is set to one, indicating that the taint variable is passed as a parameter.
Finally, in the phase of CALL, when SinkFuncFlag = 1&&TaintVarFlag = 1 is found, it is the taint sink, and the sample is determined to be a webshell.
It is worth noting that Opline Eval itself represents a dangerous function call, so it only needs to meet TaintVarFlag = 1 to qualify as a webshell. In addition, SinkFuncFlag and TaintVarFlag always appear in pairs, and both Flag values are reset to zero after the judgment is completed.

Example Illustration
Reviewing the include-type webshell example named webshell-1.php in Section 3 and analyzing it using the method proposed in this paper, the sample code is transformed into Opline sequences, as shown in Figure 4. In this example, hello.txt is the file included in the webshell-1.php. The Opline sequences of hello.txt are obtained by file inclusion recursion detection, which is shown in Figure 5.  In Figure 5, it is observed that Line 8 is in accordance with the taint propagation rule of FETCH_R v A , C, and C is the super global variable _POST, so Variable 4 is added to the taint variable list and a taint propagation chain is created at the same time, with Variable 4 as the head node of this chain. Then, the Opline in Line 9 conforms to the taint propagation rule of FETCH_DIM_R v A , C, so the return value of Variable 6 is tainted, added to the taint variable list, and inserted into the taint propagation chain with Variable 4 as the previous node. On Line 10, the extended_value of Opline is EVAL, and op1 is Variable 6, which is a taint variable. Therefore, it meets the taint sink rules, and the sample file is determined to be a webshell file.
The method proposed in this paper can be used to easily determine that the sample code is a webshell. We will further verify the advantages of our method in the next section.

Evaluation
In order to evaluate the webshell static detection method based on taint analysis proposed in this paper, a series of experiments is designed on the real program in this section and compared with relevant technologies. The experiments are described below.

Evaluation Setup
We designed the experiments to answer the following research questions: RQ1: Is the interprocedural analysis module based on taint analysis effective at the detection of user-defined function-type webshells and include-type webshells? RQ2: Does the webshell detection method based on taint analysis have a better effect against unknown webshells? RQ3: Does WTA have better performance than well-known webshell detection tools?
The first two experiments were used to evaluate the two improved techniques proposed in this paper, and the third experiment was used to evaluate the overall performance of the method proposed in this paper.
Experimental infrastructure. All experiments were run on a machine with an Intel Core i7-10875h processor, four 2.30 GHz logic cores, and 16 GB of RAM, and the operating system was 64-bit Windows10 20H2 or 64-bit Linux Ubuntu 18.04. The PHP version was 7.1.24.

Evaluation Benchmarks
There are many publicly available webshell datasets on the Internet that can be obtained through GitHub. Since the collection of these datasets is random, there are problems such as sample duplication, sample execution failure, and incorrect sample format. In addition, the research object of this paper is PHP webshells. After cleaning, we collected a total of 1776 PHP webshells from 10 open-source datasets on GitHub. The sources of the samples are shown in Table 3. These datasets from GitHub have different purposes for collecting webshells. Some of them aim to collect the most comprehensive webshells, so in addition to the language PHP, the languages of the samples also include ASP, Java, Python, and so on, such as tennc/webshell. Some divide PHP webshells according to their families and only collect webshells with typical family characteristics, such as S0MD3v/Nano. Some, such as LandGrey/webshell-Detect-Bypass, collect webshells that can bypass current detection methods for the purpose of network attack or security research. Therefore, we preprocessed the collected samples for the above 10 projects: repeated samples were excluded based on the SHA1 algorithm. In addition, PHPCLI was used to execute each PHP webshell and excluded some webshells that cannot be executed. Finally, we obtained 1776 executable PHP webshell samples.

Effectiveness Test of the Interprocedural Analysis Module Based on Taint Analysis (RQ1)
To evaluate the effectiveness of the interprocedural analysis module, we implemented two versions of the webshell static detection tool: Webshell Taint Analysis (WTA) and No Interprocedural Analysis Module (WTA-NO-IAM). The former uses the method proposed in this paper, while the latter does not include the interprocedural analysis module. In this experiment, there were two performance evaluation indicators. First, validating sample code was set up to test the validity of two types, user-defined function-type webshell and include-type webshell. Second, WTA and WTA-NO-IAM were applied to detect our webshell dataset, and the effectiveness of the interprocedural analysis module was evaluated by comparison of the number of webshells detected by the two tools.
The validating sample code for the webshell of the user-defined function-type webshell (WebShell-1) and the include-type webshell (WebShell-2) is shown in Listing 5.
As shown in Listing (a), the sample webshell encapsulates the easy-to-detect keyword "_POST" in the user-defined function "dynamic" and the easy-to-detect dangerous function "assert" in the user-defined function newassert(), trying to bypass the detection of the webshell detection tool. Finally, at Line 16, they are concatenated into a one-sentence webshell: assert($_POST[' x ']).
The include-type webshell is shown in Lists (b) and (c), where List (b) is the body of the webshell and List (c) is the content of the included file. The sample webshell uses "include" to wrap the dangerous function "eval" into the user-defined function "HelloWorld" and calls "HelloWorld" in the body of the webshell file. The webshell is finally achieved at Line 3 of the list (b): eval ($_POST['hello']).
WTA, WTA-NO-IAM, and famous webshell tools were used to detect the two webshells, and the test results are shown in Table 6. $e = " a ### s s e ### r t " ; 9 .
$ f = chunk_split ( $e , 1 , " # " ) ; 1 0 . $g = s t r _ r e p l a c e ( " # " , " " , $ f ) ; 1 1 . r e t u r n $g ; 1 2 . } 1 3 . $a = dynamic ( ) ; For the detection of the sample webshells, it can be observed from the experimental results that both samples could be detected by WTA, indicating that WTA can detect the user-defined function-type webshells and include-type webshells. For the other tools, only D-Shield could detect webshell-1 and report it as suspicious at Level 1 (D-Shield detects webshells on a scale of five, with Level 1 being the least dangerous). D-Shield judged it as a Level-1 webshell because it detected variable function [40] ). This shows that the performance of the regular matching detection method adopted by the current detection tools is weak at interprocedural analysis, especially for include-type webshells.  For the controlled experiment of WTA and WTA-NO-IAM, the recall rate of WTA was 96.4%, while the recall rate of WTA-NO-IAM was only 74.6%. Therefore, the interprocedural analysis module of WTA plays a crucial role in the detection of webshell samples.
To sum up, the answer to RQ1 is obvious. The interprocedural analysis module based on taint analysis is effective at the detection of user-defined function-type webshells and include-type webshells.

The Validity Test of the Webshell Detection Method Based on Taint Analysis against Unknown Webshells (RQ2)
At present, the mainstream webshell detection tools mostly use the detection method based on regular matching. By capturing the webshells in the wild, the corresponding features are extracted and added into the feature library. Therefore, the webshells that have been spread on the network for a period of time are easier to detect. Moreover, this method of blacklist matching is easy to bypass. How to improve the detection ability against unknown webshells is a goal pursued by various webshell detection tools.
To evaluate the effectiveness of this detection method against unknown webshells, we found 5 generation tools that can generate antidetect PHP webshells randomly and used each tool to generate 10 webshells, respectively. Therefore, there was a total of 50 samples. The information on the 5 webshell generation tools is shown in Table 8. Listing 6 shows the code snippet of pureqh. There are many anchors for replacement in the code, such as {1}, {6}, {7}. These anchor will be replaced with random strings when pureqh runs. Therefore, the webshell was generated each time with different hash values. It is difficult for current static detection tools to extract the features, resulting in detection failure.
The popular webshell detection tools mentioned in Table 8 were used to detect the 50 samples, and the detection results are shown in Table 9.
From the experimental results, it can be observed that WTA had an excellent detection effect for the randomly generated unknown webshells. All 50 samples could be detected, and the recall rate reached 100%. Since WTA adopts the taint analysis method for detection, there was no need to extract the corresponding features, and the detection effect was better for unknown webshells. Among the well-known webshell detection tools, only D-Shield, WebshellKiller (recall mode), and CloudWalker could find the webshells. D-Shield could detect 40 webshells, but could not detect webshells generated by pureqh. WebshellKiller (recall mode) could only detect the samples generated by weevely and b374k; other samples could not be detected. CloudWalker could only find 10 webshells. $v = 0 ; 5 .
$ v b i t s += 8 ; r e t u r n $ { 8 } ; { 4 } $ Table 9. The detection effect for unknown webshells. It is worth mentioning that D-Shield's detection report stated that the 40 samples were "known webshells", indicating that D-Shield only noticed the four generation tools and added their features to their webshell feature library. The updated time of these tools is also a good indication of this viewpoint. The detected webshells were all generated by tools updated before 2021. The oldest tool, b374k, was last updated on 13 December 2016. As for the webshells generated by the latest tool pureqh, D-Shield could not detect them, while our WTA based on taint analysis could achieve a better detection effect for unknown webshells.
To sum up, the answer to RQ2 is evident. The webshell detection method based on taint analysis has a better effect against unknown webshells and can provide important help for detecting webshells.

WTA and Well-Known Webshell Detection Tools for Performance Comparison (RQ3)
The above two experiments evaluated the effectiveness of the two key techniques in this paper. The experiment in this section evaluated the overall performance of the system and whether the method presented in this paper can improve the performance of webshell detection.
In order to better evaluate the performance of webshell detection tools, the evaluation indicators of this experiment are defined as follows: We regarded webshells as positive samples and normal files as negative samples; True Positive (TP). The webshell sample is correctly recognized as a webshell; False positive (FP). The normal file is misidentified as a webshell; True Negative (TN). The normal file is correctly recognized as a normal file; False Negative (FN). The webshell sample is misidentified as a normal file; Accuracy. The proportion of correctly predicted samples to all samples. The formula is as follows: Accuracy = TP + TN TP + FP + TN + FN ; (1) Recall. The proportion of correctly predicted webshell samples to the real webshell samples; the higher the recall rate, the better the performance is for potential webshells' detection. The formula is as follows: (2) Precision. The proportion of correctly predicted webshell samples to the predicted webshell samples; the higher the precision rate, the lower the false positive rate is. The formula is as follows: F-measure. The F-measure is a comprehensive consideration of recall and precision. Generally, a higher F1 indicates that the experimental method is more effective. The larger β is, the more importance is attached to the recall. The β values used in our experiment were 0.5, 1, and 1.5. The formula is as follows: In the experiment, the method presented in this paper was compared with wellknown webshell detection tools, and a controlled experiment was conducted based on the experimental dataset in Section 6.2. The experimental results are shown in Table 10. The experimental results showed that D-Shield had the best comprehensive performance among the well-known webshell detection tools, whose recall was 90.54%, precision 99.81%, and F 1 94.95%. However, our system WTA had better comprehensive performance than D-Shield, with a recall of 96.45%, which was 5.91% higher than D-Shield, precision 97.71%, slightly lower than D-Shield, but also a high level of precision, and F 1 97.08%, which was 2.13% higher than D-Shield. Obviously, the performance of our method was at the top of all webshell detection tools.
It is worth noting that the performance of the two modes in WebshellKiller was quite different, with a precision of 99.77% in precision mode. While some precision was sacrificed in recall mode, the recall was 26.97% higher than precision mode, but still only 77.47%, which is an average performance. SHELLPUB's detection speed was the fastest among all the tools: the detection of webshell samples took less than 10 seconds; however, its recall was too low. CloudWalker applies a number of detection techniques, such as statistical feature detection, AST detection, regular matching, machine learning, etc. Therefore, its detection speed was the slowest, and the average detection time was three-times that of other tools.

Discussion
In this section, we discuss the limitations and future developments of static webshell detection methods based on taint analysis to improve the integrity of WTA.
Although our webshell static detection method based on taint analysis analysis had a more complete interprocedural capability than traditional methods and could detect more unknown webshells, it still could not guarantee that it could detect all new unknown webshells. Many factors affect WTA's detection of unknown webshells, such as the propagation rules of the Opline in this paper not being totally comprehensive, some webshells using new PHP features, and so on. In future work, we will further study the taint propagation rules of the Opline, which are not involved at present, and expand the current taint propagation logic. In addition, we will update the taint propagation logic of the Opline in the new version of PHP by updating to it and continue to study the principles and features of new webshells caused by the features of the new version of PHP.
In addition, our method is extensible. Specifically, we will further expand the detection objects, such as web application vulnerabilities, SQL injection vulnerabilities, XSS vulnerabilities, and so on. For example, we can build the taint import rules of SQL injection, improve the corresponding taint propagation logic, and find the dangerous functions list of SQL injection separately, then finally realize the detection of SQL injection vulnerabilities. In this paper, the effectiveness of our method in webshell detection was evaluated in a preliminarily fashion. In the future, we will expand to web application vulnerability detection, such as SQL injection vulnerability detection, XSS vulnerability detection, and so on.

Related Works
In this section, some dynamic feature-based and static feature-based detection methods are introduced, respectively.

Dynamic Methods
Tian et al. [9] proposed a malicious webshell detection method based on a Convolutional Neural Network (CNN). This method first obtains the HTTP request and then uses word2vec to represent each word as a vector. In this case, each HTTP request can be transformed into a fixed-size matrix; finally, a model is trained to detect and classify a file based on CNN. In fact, this detection method is based on network traffic, which uses a convolutional neural network to monitor, model, and train the traffic at webshell runtime. This method has better classification performance than the method based on malicious keyword matching, but it also has some drawbacks: First, if attackers reduce the frequency of communication, such as disguising the operations as normal behaviors and executing the required command only once, it would easily bypass the detection based on network traffic. Second, running the webshell in real time results in bad performance and consumes many computing resources, which may lead to the destruction of key nodes in the system.
In addition, there are methods to detect abnormal behaviors of the webserver to detect web attacks. The main detection idea is to extract the characteristics of abnormal network behaviors, distinguish them from the normal network behavior, and construct the abnormal network activity label for web attack detection. However, systems based on anomaly detection often produce a large number of false positives, because it is difficult to construct the algorithm for labeling normal and abnormal behaviors, and it is easy to mark normal behaviors as suspicious operations or omit some real abnormal behaviors. Robertson et al. [41] proposed a network attack detection method. This method uses exception generalization technology to convert suspicious web requests into abnormal signatures and then uses these signatures to group similar abnormal samples. Kruegel et al. [42] proposed an intrusion detection system that uses a variety of different anomaly detection techniques to detect attacks against web servers and web applications. Almgren et al. [43] took into account the characteristics of different types of host-based attacks and developed a lightweight tool for online detection of webserver attacks that can run and track suspected hosts in real-time.

Static Methods
Tian et al. [9] and Tu et al. [3] used regular matching and keyword feature matching to detect webshells. This method can be effective at identifying some webshells, but webshells are usually written in high-level languages, which have abstract lexical and syntactic features. These features cannot be fully reflected in regular expressions, so it is difficult to extract abstract features in this method, and there may be missing problems in the detection process.
Zhu et al. [44] considered the abstract lexical and syntactic features in high-level languages (especially the PHP language) and proposed a detection method based on multiview feature fusion. First, this method extracts the abstract features of the vocabulary and syntax that represent the internal meaning of the webshell. Secondly, the Fisher score is used to rank each feature according to its importance. Finally, a model is established based on the optimized Support Vector Machine (SVM), and it could detect webshells effectively.
The text feature recognition method is also the main method in webshell static detection, which often plays a role together with a neural network and deep learning. Tu et al. [45] proposed a webshell detection system based on a scoring mechanism, which determines whether suspicious files belong to a webshell by scoring. The factors of scoring are the function type, the number of dangerous functions, the signature status, the longest string length, and so on. Thresholds are then determined, and score accumulation is performed when some factors exceed the threshold. This method mainly has the following problems: first, because this method is mainly based on the feature library constructed by experts to determine dangerous functions and other factors, the new webshell cannot be detected; second, if the attacker encrypts or splits dangerous functions and sensitive parameters, it cannot be detected directly.
Each programmer's programming style results in different code syntax, and these syntactic variations are difficult points in taint analysis. Kurniawan et al. [46] summarized the possible syntax variants based on AST and reconstructed the PHP parser, which can reduce the syntax objects to be visited in the process of taint analysis. In contrast, our taint analysis method performs analysis on the PHP Opline. The Opline is a ZendVM instruction, so our method performs taint analysis on the PHP Opline, which naturally can resolve grammatical variants.
Le et al. [47,48] combined taint analysis and pattern matching to detect webshells. Taint analysis is performed to divide the code into tokens during the lexical analysis phase. Taint analysis is performed based on tokens, similar to RIPS [19], and pattern matching can match a few one-sentence webshells.
The method of statistical characteristics summarizes the characteristics of an entire webshell file according to the attribute values of certain aspects of the file. Due to the rapid development of web services, developers tend to use encryption and obtrusion techniques to avoid source code leakage, which leads to the statistical characteristics of normal files being similar to that of the webshell files. Therefore, a webshell detection method based on statistical characteristics loses its original advantages. Pan et al. [14] proposed a webshell detection method based on executable data features in PHP code. This method combines the characteristics of executable data from the PHP code with the characteristics of the static text to detect webshells. Compared to the traditional static statistical method, this method can improve the recognition ability.
Webshell detection systems will use different classification methods to determine whether a suspicious file is a webshell. For example, the webshell detection method proposed by Wang et al. [49] uses a multilayer neural network to detect and classify suspicious files. Cui et al. [50] used the combination of a random forest classifier and a GBDT classifier for classification. Fang et al. [21] used the fastText algorithm to train the Opcode sequence model and predicted the corresponding features of the samples. Finally, random forest was used to realize the binary classification. Each of these methods has its advantages and disadvantages. Ai et al. [51] proposed a webshell detection method based on ensemble learning, which constructed a differentiated ensemble detection model, WS-LSMR, composed of Logistic Regression (LR), Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Random Forest (RF). Given the four basic classifiers (LR, SVC, MLP, RF), this model adaptively assigns weights to the four classifiers, and algorithms with high accuracy will have high weights to better reflect the effect of good algorithms.

Conclusions
Webshells are an important threat to network security. Attackers using a webshell can invade websites, control servers, steal sensitive files, and further invade the internal network. How to improve the capability of interprocedural analysis and improve the detection ability for unknown webshells are the main challenges of webshell detection. This paper proposed a webshell static detection method based on taint analysis. For the first time, we constructed a set of user-defined taint propagation rules and a set of user-defined taint sink rules for the unique instruction set of the ZendVM. A PHP webshell detection method was formed by the combination of the two sets of rules and the detection of the Opline taint source. Based on this method, we implemented a static taint analysis prototype system named WTA for the detection of PHP webshells.
Experimental results showed that WTA supports interprocedural analysis and has the ability to detect unknown webshells. Compared with the current popular webshell detection tools, WTA can detect more webshells. Its recall rate reached 96.45%, which was 5.91% higher than the best-performing tool among the other tools. The precision rate was 97.71%, and the F 1 was 97.08%.