WeDIGAR: A Light-Weighted Webshell Detection Framework for Satellite and UAV Networks

Fu, Shun; Li, Hao; Zhu, Panpan; Tong, Jian; Yang, Jinye; Xu, Ji

doi:10.3390/electronics14214301

Open AccessArticle

WeDIGAR: A Light-Weighted Webshell Detection Framework for Satellite and UAV Networks

by

Shun Fu

¹,

Hao Li

¹

,

Panpan Zhu

²,

Jian Tong

³,

Jinye Yang

² and

Ji Xu

^2,*

¹

School of AI and Big-Data, Chongqing Industry Polytechnic University, Chongqing 401120, China

²

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550000, China

³

Guizhou BaishanCloud Technology Co., Ltd., Guiyang 550000, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(21), 4301; https://doi.org/10.3390/electronics14214301

Submission received: 22 September 2025 / Revised: 26 October 2025 / Accepted: 28 October 2025 / Published: 31 October 2025

(This article belongs to the Special Issue Advances in Satellite/UAV Communications)

Download

Browse Figures

Versions Notes

Abstract

In satellite and Unmanned Aerial Vehicle (UAV) networks, detecting webshells presents unique challenges, particularly on ground station edge nodes. Large, resource-intensive detection models are not feasible as nodes have insufficient computing power and limited time for analysis. This paper introduces a novel approach for webshell detection tailored for these environments. Our method first extracts structural and semantic Information Granules (IGs) from the HTTP response bodies sent from the remote systems. Next, we construct a causal graph to identify and remove irrelevant IGs that are not linked to the webshell label. Finally, a random forest classifier is applied to the remaining, relevant IGs. This lightweight component has been empirically validated in both laboratory experiments and a simulated industrial application scenario. The results show that our method achieved an impressive accuracy rate exceeding 99% and a response time of less than 10 milliseconds for each request, significantly outperforming legacy systems based on graph convolutional networks.

Keywords:

webshell detection; granular computing; causal learning; random forest; satellite and UAV networks

1. Introduction

1.1. Context and Challenges in Satellite/UAV Edge Networks

Most contemporary businesses and critical infrastructure, including military and communication systems, are increasingly reliant on next-generation networks, such as satellite networks and Unmanned Aerial Vehicle (UAV) networks. These networks provide essential services like remote sensing, precision agriculture, and disaster response. Consequently, the significance of network security has grown exponentially. The unique nature of these networks—characterized by remote, often autonomous, and resource-constrained nodes—makes them particularly vulnerable to cyber threats. Injection attacks encompass various forms, including Cross-Site Script Attack (XSS), Structured Query Language (SQL) injection, webshells, etc. Among these attacks, webshells play a notable role that causes substantial financial losses amounting to billions of dollars each year. As a result, it becomes crucial to prioritize the mitigation of webshell attacks [1].

As a prime example of an injection attack, a webshell script is uploaded onto the server by exploiting vulnerabilities within a website. This enables the attacker to gain control over the server and carry out various operations, including file management and command execution. Essentially, the webshell script acts as a customized backdoor, offering the attacker the capability of executing arbitrary actions on the compromised website. The process of a webshell attack is visually depicted in Figure 1.

1.2. Limitations of Existing Webshell Detection Methods

Researchers have made significant contributions in the field of webshell attack detection [2]. There are three types of webshell detection models: rules-based, machine-learning-based, and hybrid methods (combining fixed rules with machine learning). For example, computing the scores of malicious signatures and malicious feature samples is a rule-based method [3], while the use of multilayer perceptrons, random forests, convolutional neural networks (CNNs), and support vector machine (SVMs) are all methods based on machine learning [4,5,6]. Although most machine-learning-based methods are supervised, sometimes supervised methods work well too [7]. In addition to these two categories, combining the features of operation code sequences with naïve Bayes [8] belongs to the hybrid type.

There are two types of analyzing methods for webshell attacks based on the object analysis. One is based on source file analysis, which extracts opcode sequence features from source files and analyzes them [9]; the other is based on HTTP traffic analysis, which analyzes traffic changes or HTTP requests [10,11].

In a scenario involving a satellite network operator or a UAV fleet management service, strict access restrictions are often enforced to protect the privacy and security of source code and sensitive data on cloud servers or edge nodes. This is especially true for systems operating in remote or hostile environments. Consequently, directly accessing source files on the satellite’s or UAV’s embedded systems, or sampling all HTTP traffic between nodes, is not feasible. The available bandwidth is often limited, and the need for real-time analysis is paramount. As a result, the only viable source of information for identifying webshell attacks will be the HTTP response body (see Figure 2), which is often relayed back to a ground station or a centralized management system.

The unique constraints imposed by relying solely on the HTML response body for webshell detection intensify the challenges of obtaining high accuracy. Fortunately, the focus can be narrowed down to effectively processing the plain (unencrypted) HTML response body in most cases, without considering the encryption or decryption of HTML responses. By concentrating efforts on analyzing and understanding the plain response body patterns, it is feasible to develop robust and efficient webshell detection solutions. The method presented in this paper specifically targets the scenario where CDN providers offer additional security services, as illustrated in Figure 3.

In cloud or edge computing environments, attackers often interact with an uploaded webshell entirely over HTTP, such as in the following:

1.: Upload Phase—The attacker exploits a file upload vulnerability to place a malicious shell.php file in a web-accessible directory:
POST /upload.php HTTP/1.1
Host: victim.example.com
Content-Type: multipart/form-data;
boundary=----abcd
Content-Length: 532

------abcd
Content-Disposition: form-data;
name="file"; filename="shell.php"
Content-Type: application/x-php

<?php if(isset($_REQUEST[’cmd’]))
{ system($_REQUEST[’cmd’]); } ?>
------abcd--

2.: Command Execution Phase—Once the webshell is in place, the attacker issues commands via crafted HTTP requests, such as, for example,

GET /uploads/shell.php?
cmd=cat+/etc/passwd HTTP/1.1
Host: victim.example.com

The HTML response body returned to the client contains the command output, often wrapped in simple HTML formatting such as <pre> tags, making it appear as part of a normal page.

3.: Ongoing Interaction—By repeatedly sending HTTP requests with different cmd parameters, the attacker can perform file management, database queries, and other arbitrary actions.

This type of attack is especially relevant in Content Delivery Network (CDN) or edge-node scenarios because the detection system cannot scan source files directly. Instead, it must rely solely on analysis of the returned Hypertext Transfer Protocol (HTTP) response body to identify potential threats.

Within this particular context, the algorithm design is subject to specific restrictions and requirements. These constraints may arise due to factors such as limited computational resources, the need for real-time analysis, or compliance with industry standards and regulations. First of all, to ensure the privacy and security of customers, the data stored in the cloud cannot be scanned in the process of detecting webshells. The appropriate data available is the Hypertext Markup Language (HTML) body.

Second, we want to ensure the accuracy of identifying attacks. As a cloud service provider, the number of requests that need to be processed every day is a huge number (up to millions). Under such a large number of processing times, an error rate of 1% means that tens of thousands of requests are incorrectly classified.

Finally, the detection model should be light-weighted and efficient enough. If the time taken to identify a webshell attack is too long, the customer experience will be poor, so the time taken to identify a webshell attack should be as short as possible. In addition, the computing ability of edge nodes is not as strong as that of a computing work station.

1.3. Our Approach and Main Contributions

To fulfill these requirements in satellite/UAV edge networks, we propose WeDIGAR (Webshell Detecting system based on Information Granules Analysis of HTML Response), a novel, light-weighted framework. We focus on optimizing every stage of the pipeline to minimize resource usage while maximizing detection accuracy. Our main contributions are summarized as follows:

Novel Information Granule (IG) Extraction and Causal Selection: We design a specialized feature engineering method utilizing only the HTTP response bodies and introduce the Non-combinatorial Optimization via Trace Exponential and Augmented lagRangian for Structure learning (NOTEARS) causal graph learning algorithm. This approach identifies the minimal set of causally relevant IGs, eliminating spurious correlations to maximize model robustness and efficiency for resource-constrained nodes.
Superior, Light-Weighted Performance: The resulting framework achieves high performance metrics essential for high-throughput environments, exhibiting high accuracy (⩾99%) and a high $F_{1}$ score (⩾99% for webshell recognition), while remaining fast enough (detection time of each sample ⩽10 ms on edge nodes) to support the timely online identification of webshell attacks.
Dataset Contribution: To enhance research on webshell detection based on HTTP responses, we release the comprehensive experimental dataset, which consists of 250,321 training and 360,000 test samples, providing a valuable resource for the security community.

1.4. Paper Organization

The organization of this paper is as follows: Section 2 introduces the related works of this method. Section 3 introduces the details of the causal learning algorithm used in the model. The detailed implementation of WeDIGAR is presented in Section 4, and the datasets, experiments, and

β

-testing are reported in Section 5. Section 6 provides a brief summary of this paper.

2. Related Works

Currently, there are multiple approaches available for detecting webshells, which can be divided into two categories based on the objective used for investigation. The first category is to identify webshell attacks by analyzing web page source files (such as PHP, aspx, etc.), while the second category focuses on analyzing HTTP traffic to detect webshell attacks. Figure 4 illustrates the techniques for webshell detection, and the methods described below are mostly a combination of the approaches depicted in the figure.

2.1. Based on Source File

Among the various methods for detecting webshell attacks, analyzing source files is the most widely used approach. These methods can be further categorized into two main types: rule-based methods and learning-based methods.

Most of the rule-based methods are based on the understanding of webshell scripts to formulate rules, or combined with some statistical methods. Among them, an early webshell detection system is built by computing the scores of malicious signatures and malicious feature samples [3]. Deng et al. provided detailed lexical analysis of webshell scripts that provides a large amount of convenience for subsequent detection [12]. A static webshell detection method based on taint analysis (Webshell Taint Analysis, WTA) traces the spread of taint variables, conducts inter-procedural analysis on them, and finally makes webshell identification based on the invocation of dangerous functions and the references to taint variables [13].

Learning-based methods are slightly more popular than rule-based methods. Most learning-based methods need to extract and process features first and then make judgments through classification models. A simple method is to directly analyze the webshell interface, extract features, and then analyze the results via matrix decomposition [14]. Alternatively, one can extract features via pattern matching techniques and then apply a Convolutional Neural Network (CNN) for prediction [15].

However, this feature extraction method relies heavily on prior knowledge. A more common practice is to apply some machine learning algorithms to learn or transform features first, and to then make classification. In this way, the problems caused by manual selection of features can be avoided and the performance of the algorithm can be improved. For example, the words in the PHP source file can be vectorized by Word2vec first, and then the Gated Recurrent Unit (GRU) and attention mechanism can be used to check the webshell attack [16]. Recently, a similar approach has been proposed to target a variety of other formats, such as Java Server Pages (JSP), Active Server Pages eXtended (ASPX), and Active Server Pages (ASP) [17]. It is also an effective method to apply Term Frequency–Inverse Document Frequency (TF-IDF) to calculate the word frequency matrix. After obtaining the word frequency matrix, the model only needs to add a multilayer perceptron to find out the webshell [6].

Ensemble learning is also commonly used to improve models for webshell detection. WS-LSMR integrates logistic regression, a Support Vector Machine (SVM), a multilayer perceptron, and a random forest [18]. This method has higher accuracy than a single model. In addition, it is also necessary to decide whether a model needs to be integrated according to the application scenario. Through simulation experiments, Yong et al. found that, in the scenario of lightweight Internet of Things (IoT) devices (such as smart phones, headsets), using random forests and extremely random trees alone will have better results, while, for heavyweight IoT scenarios (consisting of computers or servers, etc.), the ensemble model performs better [19].

During surveying related works, we found the decision trees and random forests that are the most commonly used. They are a decision tree based on an expert system [20], the combination of FastText and a random forest [4], and a Random Forest–Gradient Boosting Decision Tree (RF-GBDT) that combines a random forest and GBDT classifier [9]. Researchers continue to combine random forests with different algorithms to generate new models, and the excellent performance of these models fully illustrates the potential of the random forest algorithm.

There are also methods that combine rule-based matching methods with learning-based methods, for example, first matching the opcode sequence and then training a naive Bayes classifier for classification [8]. Compared with the rule-based detection method, the recognition accuracy and efficiency of this model are greatly improved.

To provide a clearer overview of the research landscape and to precisely position our contribution, we summarize the main categories of webshell detection methods in Table 1. The table compares existing approaches based on their core methodology, key features, and inherent limitations. As the comparison elucidates, the majority of methods either require access to server source files or full HTTP traffic, which is infeasible in our target scenario of resource-constrained, privacy-sensitive edge nodes. In contrast, our proposed WeDIGAR framework is uniquely designed to operate solely on the HTML response body, leveraging a novel pipeline of information granulation and causal learning to achieve a lightweight yet highly accurate detection suitable for immediate deployment in satellite and UAV ground stations.

2.2. Based on HTTP Traffic

Compared with the analysis method based on source files, the analysis method based on HTTP traffic is less used, mainly because the data is less informative for webshell detection while being more difficult to collect.

Tian et al. proposed a webshell attack detection method based on Word2vec representation and a Convolutional Neural Network (CNN), which is the first time that a CNN is applied to webshell detection [5]. After that, many researchers began to introduce CNNs into webshell detection. For example, by using CNNs and Long Short-Term Memory (LSTM) to monitor request and response traffic, it is possible to quickly discover abnormal behaviors and detect the existence of webshell scripts [11]. In addition to CNNs, the classic SVM can also be applied in the field of traffic analysis. Yang et al. proposed an attack detection technology based on an SVM algorithm to locate the webshell attack in HTTP traffic [10].

In addition to the above methods that directly analyze HTTP traffic, there are also methods that try to restore web sessions and extract features from the sequence of HTTP requests for analysis. For example, a webshell detection framework was built using long LSTM and Hidden Markov Models (HMMs) to extract features from the raw sequence data of web logs [21]. This method is more efficient than those based on source files but requires restoring sessions as a data source. Thus, this method cannot meet the requirement of real-time analysis.

These methods based on HTTP traffic analysis mainly depend on traffic changes, but the application scenario on edge computing nodes of a CDN provider only allows one to analyze the HTML response due to the computing capability and real-time discrimination constraint. Therefore, the aforementioned methods are not directly applicable.

3. Causal Learning with NOTEARS

Feature selection is an ordinary methodology used to reduce the dimension of data and improve the classification performance [22]. However, we believe that the causal relation between features is more powerful than correlation in webshell detection. Zheng et al. proposed Non-combinatorial Optimization via Trace Exponential and Augmented lagRangian for Structure learning (NOTEARS) in 2018 [23]. NOTEARS transforms the structural learning problem from combinatorial optimization into purely continuous optimization, and thus can be solved directly by standard numerical algorithms. The graph structure learned by this algorithm can not only be used as the graph structure for the graph convolution network but can also be used for causal reasoning. Compared with the classical Peter–Clark (PC) algorithm [24], NOTEARS results in higher accuracies in downstream WeDIGAR classification.

NOTEARS is a non-combinatorial optimization-based causal graph learning method particularly suitable for capturing causal relationships between features. Unlike traditional correlation-based methods, NOTEARS does not only focus on the correlations between features but also constructs a Directed Acyclic Graph (DAG) that explicitly represents causal relationships between features. Specifically, NOTEARS optimizes the structure learning problem and automatically discovers which features have a direct or indirect influence on the target variable (whether a webshell attack occurs).

In webshell detection tasks, the relationship between features and labels is not always simple or linear. For example, some features within HTML tags (such as content in the script tag) might directly affect the detection of a webshell, while other seemingly unrelated features (such as text in a div tag) might not have a direct relationship with the attack label. By using NOTEARS for causal graph learning, we can automatically identify which features are crucial in the decision-making process, thereby improving classification performance and interpretability.

Generally, the causal relationship between features can be encoded by a directed acyclic graph W. NOTEARS follows the Structural Equation Model (SEM), which assumes that each feature

X_{j}

in a sample X can be constructed via

X_{j} = w_{j}^{⊤} X + z_{j}

, where

w_{j}

is the jth column vector of W and

z_{j}

is a constant.

NOTEARS formulates the objective function as Equation (1) by comparing X with X reconstructed by W, with a regularization term encouraging sparse edges [23].

\begin{matrix} F (W) & = ℓ (W; X) + {λ ∥ W ∥}_{1} \\ = \frac{1}{2 n} {∥ X - X W ∥}_{F}^{2} + λ {∥ W ∥}_{1}, \end{matrix}

(1)

where

λ

is the regularization parameter.

To obtain the graph structure that best describes the feature relationship, one needs to solve the following constrained optimization problem [23]:

\begin{matrix} min_{W \in R^{d \times d}} & F (x) \\ s u b j e c t t o & G (W) \in D, \end{matrix}

(2)

where

G (W) \in D

is the constraint on the graph structure W; that is, W should be a directed acyclic graph. NOTEARS converts combinatorial acyclic constraints into continuous equality constraints, as shown in Equation (3), in which

h (\cdot)

is a continuous function. The minimum value of

F (W)

can be found directly by standard numerical algorithms under the constraints of continuous equality.

\begin{matrix} \begin{matrix} min_{W \in R^{d \times d}} & F (x) \\ s u b j e c t t o & G (W) \in D \end{matrix} \end{matrix} ⟺ \begin{matrix} \begin{matrix} min_{W \in R^{d \times d}} & F (x) \\ s u b j e c t t o & h (W) = 0 \end{matrix} \end{matrix}

(3)

For this equality-constrained programming problem, NOTEARS is mainly divided into three steps:

(1): Transform the constrained problem into a series of unconstrained subproblems by augmenting Lagrangian;
(2): Solve the unconstrained subproblem by the L-BFGS (Limited-memory Broyden–Fletcher-Goldfarb–Shanno) algorithm [25];
(3): Take a reasonable threshold.

The overall time complexity of the L-BFGS update in NOTEARS is

O (m^{2} | S | + m^{3} + m | S | T)

, where m is the size of the memory,

| S |

is the size of the active set, and T is the number of inner iterations [23].

4. WeDIGAR Model

WeDIGAR consists of three main stages. The first is information granule construction, which transforms the original HTML response body

D_{r a w}

into

D_{i n t e g}

by integrating structural and semantic Information Granules (IGs) of the HTML response body. Secondly, the information granules of

D_{i n t e g}

are selected by using a Directed Acyclic Graph (DAG) learned by NOTEARS; that is, the information granules that cannot reach the class label in the DAG will be excluded. The information granule selection operation transforms

D_{i n t e g}

into

D_{f i n a l}

. Finally, a random forest is used for classification to obtain the webshell detection result. The architecture of WeDIGAR is shown in Figure 5.

Since the random forest [26] is a well-known classifier, we focus on the construction of information granules and information granule selection here.

4.1. Information Granule Construction

Prior to training the model, it is crucial to extract meaningful IGs from the HTML response. The IGs contained within the HTML response can be categorized into two distinct parts: structural information granules and semantic information granules.

Structural information granules pertain to the presence of tags within an HTML response. These tags present the overall structure of an HTML document, so they are referred to as structural information granules. Semantic information granules, on the other hand, encompass the details embedded within the tags, such as text, scripts, and content in other forms. By combining both structural and semantic IGs, we can comprehensively represent the information encapsulated within an HTML response. The details of how to construct structural and semantic IGs from an HTML response is given below.

4.1.1. Structural Information Granules

The HTML response body is a semi-structural document. Therefore, instead of treating the response body as a plain text file, it is more appropriate to segment it based on the tags and construct structural IGs.

During the structural IG construction phase, it is crucial to consider all possible tags and their associated attributes. Subsequently, these structural IGs should be refined and adjusted based on prior knowledge and the distribution of the dataset. The process is illustrated in the right part of Figure 5.

Therefore, the HTML response body is firstly segmented according to the tag using Regular Expression (RE) matching, and different structural IG construction methods are applied to different tags. For some tags, in addition to recording the attribute information attached to the tag, the information carried by the nested text or script is also recorded. After analysis, we construct the initial structural IGs

F_{i n i t}

. In this way, dataset

D_{r a w}

is transformed into

D_{i n i t}

.

Then, according to the prior knowledge, the structural IGs that are completely irrelevant to webshell recognition (e.g., the font size, font color, etc.) will be removed. After removing this part of structural IGs from

F_{i n i t}

, we obtain new structural IGs

F_{t e m p}

. Accordingly, the subset of the dataset

D_{i n i t}

corresponding to these structural IGs

F_{t e m p}

is changed into

D_{t e m p}

.

Finally, the structural IGs are filtered based on frequency to remove non-discriminatory features. Specifically, the following is carried out:

Cardinality-One Filter (Too Common): Structural IGs with a cardinality of one (the same value on more than 99% of all samples in the dataset $D_{t e m p}$ ) are removed. These are non-discriminatory background features that do not contribute to classification.
Never-Appeared Filter (Too Rare): Structural IGs that appear in fewer than five samples in $D_{t e m p}$ are removed. These extremely sparse features introduce noise and increase model complexity without statistical significance.

In this way,

F_{s t r u c}

is derived from

F_{t e m p}

and the dataset is transformed into

D_{s t r u c}

. The

F_{s t r u c}

fully describes the structural IGs contained in the HTML response.

4.1.2. Semantic Information Granules

Alongside the structural IGs, the semantic content in the HTML response plays an important role in webshell detection. This paper utilizes the Term Frequency times Inverse Document Frequency (TF-IDF) index [27] to extract the semantic IGs from the HTML tags. The steps are as follows:

First, the information granule transformer

T

using the TF-IDF algorithm is trained on all the samples in

D_{r a w}

. The use of TF-IDF serves as the initial step to down-weight general, high-frequency words that appear across the entire dataset.

Next, to isolate keywords that are specifically indicative of webshell activity, we perform class-specific filtering based on the frequency distribution of words between the positive and negative sample sets. We use

T

to count the keyword sets in the positive samples and negative samples to obtain the positive sample keyword set

S_{p o s}

, and the negative sample keyword set

S_{n e g}

. While TF-IDF effectively reduces overall noise, certain non-malicious words remain that are highly frequent specifically within the negative (normal traffic) samples (

S_{n e g}

). These terms are not discriminatory and must be explicitly removed to prevent the model from learning spurious correlations based on benign patterns.

To achieve this targeted filtering, we define a function

t o p (\cdot, \cdot)

as shown in Equation (4). This function sorts the elements in the collection according to the frequency of occurrence from high to low.

\begin{matrix} t o p (1, S) & = {\overset{S o r t e d b y TF - IDF v a l u e}{\overset{︷}{s_{(1)}, s_{(2)}, \dots, s_{(i)}, \dots, s_{(n)}}}} \\ t o p (\frac{1}{ρ}, S) & = {s_{(1)}, s_{(2)}, \dots, s_{(⌊ n / ρ ⌋)}}, \end{matrix}

(4)

in which S is the word set appearing in the HTML response [27] and

s_{i}

is the ith keyword in the word set ranked by its TF-IDF value.

Next,

S_{p o s}

is subtracted by the top

1 / 3

of the most frequent words in negative samples (this operation performs a targeted, class-specific pruning of high-frequency normal terms in

S_{n e g}

that survived the initial TF-IDF weighting). The difference set is used as the final webshell-sensitive word set

S_{s e n s}

. The

1 / 3

heuristic was empirically chosen to provide the optimal balance for pruning common, non-discriminatory background words while retaining potentially weak, but discriminatory, malicious keywords. This insensitive word filtering operation can be formulated as

S_{s e n s} = S_{p o s} ∖ S_{n e g} .

(5)

Finally, the m items with the highest frequency in

S_{s e n s}

are evenly divided into k groups as the semantic IGs

F_{s e m}

. This grouping process serves as a dimensionality reduction technique, converting m highly sparse features into k denser, abstract features, which improves the computational efficiency and robustness of the downstream classifier. The value of k was determined via cross-validation to optimize the balance between feature representation and model efficiency (specific value detailed in Section 5.1). For each semantic information granule in

F_{s e m}

, we record the number of occurrences of its in-group keywords in the sample. Similarly, we can obtain

D_{s e m}

from

F_{s e m}

.

F_{s e m}

describes the semantic IGs contained in the HTML response body. Combining

F_{s e m}

with

F_{s t r u c}

, an integrated IGs

F_{i n t e g}

is obtained to fully describe the HTML response. Likewise, we obtain

D_{i n t e g}

from

D_{r a w}

by constructing the IGs

F_{i n t e g}

from raw HTML responses.

4.2. Information Granule Selection

The IGs

F_{i n t e g}

fully represents the original data

D_{r a w}

. However, some IGs in

F_{i n t e g}

may not have correlation with the target attribute (i.e., the label). Therefore, it is necessary to determine which IGs are truly relevant to the label and to ensure that the detection model os efficient and accurate.

We choose the NOTEARS algorithm to construct a causal graph

G

based on

D_{i n t e g}

and the corresponding label, as shown in Equation (6). The function

c a t (\cdot, \cdot)

in the formula concatenates

D_{i n t e g}

with its corresponding labels. If there is a directed edge from a to b in the causal graph

G

, then information granule b is a conditional dependent on information granule b.

G (V, E) = N O T E A R S (c a t (D_{i n t e g}, l a b e l))

(6)

Through the learned causal graph

G

, one can delete the nodes that are not correlated to the label y, which is formulated as

F_{f i n a l} \leftarrow F_{i n t e g} ∖ {f | f \in F_{i n t e g} \land f \overset{E}{↛} y},

(7)

in which

f \overset{E}{↛} y

means that there is no path from node f to node y given the directed edge set

E

. In this way, the final IGs

F_{f i n a l}

and corresponding dataset

D_{f i n a l}

are constructed.

4.3. Algorithm Description of WeDIGAR

In this section, we give an algorithmic description of the WeDIGAR method. The training phase of WeDIGAR is shown in Algorithm 1 and the testing phase in Algorithm 2.

Algorithm 1 Training phase of WeDIGAR

Require:: Raw training dataset $D_{r a w}$ with labels $y$
Ensure:: Final IG set $F_{f i n a l}$ ; Trained classifier $R F (x)$
Step 1: Information Granules Construction
1:: $[F_{i n i t}, D_{i n i t}] \leftarrow REM (D_{r a w})$
2:: $F_{s t r u c} \leftarrow RemoveOneValue (RemoveIrrelavant (F_{i n i t}))$
3:: $S_{p o s} \leftarrow top (1, {s \in D_{r a w} ∣ y = 1})$
4:: $S_{n e g} \leftarrow top (\frac{1}{3}, {s \in D_{r a w} ∣ y = 0})$
5:: $S_{s e n s} \leftarrow top (\frac{m}{M}, S_{p o s} ∖ S_{n e g})$
6:: Partition $S_{s e n s}$ into k equal subsets ${S_{s e n s}^{(i)}}_{i = 1}^{k}$
7:: $F_{s e m} \leftarrow S_{s e n s}^{(1)} \circ \dots \circ S_{s e n s}^{(k)}$
8:: $F_{i n t e g} \leftarrow F_{s t r u c} \circ F_{s e m}$
Step 2: Information Granules Selection
9:: $G (V, E) \leftarrow NOTEARS (cat (D_{i n t e g}, y))$
10:: $F_{f i n a l} \leftarrow {f \in F_{i n t e g} ∣ f \overset{E}{\to} y}$
Step 3: Classifier Training
11:: $D_{f i n a l} \leftarrow E (D_{i n i t}, F_{f i n a l})$
12:: Train $R F (x)$ using $(D_{f i n a l}, y)$
13:: return $F_{f i n a l}$ , $R F (x)$

In Algorithm 1, function

R E M (D_{r a w})

means to perform a Regular Expression Matching (REM) operation on dataset

D_{r a w}

to obtain structural IGs

F_{i n i t}

. The definitions of functions

R e m o v e I r r e l a v a n t (\cdot)

and

R e m o v e O n e V a l u e (\cdot)

are explained previously (in Section 4.1.1). The operator ∘ concatenates two row vectors.

In Algorithm 2, the function

R (\tilde{D}, F)

reduces the dataset

\tilde{D}

to only include the IGs of F, and the function

F (\tilde{D}, F_{s e m}

finds the total frequency of the words in each group of

F_{s e m}

from dataset

{\tilde{D}}_{r a w}

and forms new dataset

{\tilde{D}}_{s e m}

. The operator ◯ merges two datasets by concatenating each row of the same index.

Algorithm 2 Testing phase of WeDIGAR

Require:: Test dataset ${\tilde{D}}_{r a w}$ ; Final IG set $F_{f i n a l}$ ; $F_{s t r u c}$ ; $F_{s e m}$
Ensure:: Predicted labels $\hat{y}$
Step 1: Feature Extraction
1:: ${\tilde{D}}_{s t r u c} \leftarrow E ({\tilde{D}}_{r a w}, F_{f i n a l} \cap F_{s t r u c})$
2:: ${\tilde{D}}_{s e m} \leftarrow E ({\tilde{D}}_{r a w}, F_{f i n a l} \cap F_{s e m})$
Step 2: Feature Integration
3:: ${\tilde{D}}_{f i n a l} \leftarrow {\tilde{D}}_{s t r u c} \circ {\tilde{D}}_{s e m}$
Step 3: Prediction
4:: $\hat{y} \leftarrow R F ({\tilde{D}}_{f i n a l})$
5:: return $\hat{y}$

4.4. Time and Space Complexity

4.4.1. Time Complexity

Training Phase. Computing the structure IGs

F_{s t r u c}

for each sample cost

O (L)

, so, for the dataset,

O (N \times max (L)

, where L represents the text length of the HTML response body and N is the size of training set. Using TF-IDF to extract sensitive words needs to traverse the entire training set, hence the time complexity of

O (N \times S)

, where S is the size of possible sensitive words. In addition, the time complexity of training the random forest is

O (d \times T \times N \times log N)

, where d is the number of IGs and T is the number of decision trees. Therefore, apart from NOTEARS (whose complexity is introduced at the end of Section 3), the time complexity for training WeDIGAR is

O (N \times (L + S + d \times T \times log N))

.

Testing Phase. Extracting

F_{s t r u c}

in testing is the same as in training, but computing

F_{s e m}

is much faster than in training because the frequency of each considered sensitive word is ready to be used. Furthermore, random forest complexity in testing is

O (H \times T)

, where p is the height of the decision tree, which is much more efficient than in training. So the overall time complexity for WeDIGAR testing a sample is

O (L + H \times T)

, which is completely suitable for online deployment.

4.4.2. Space Complexity

During the information granule construction phase, structural IGs are obtained using regular expression matching. As the process is solely based on rule matching, the space complexity of information granule construction for a single sample is

O (1)

. The space complexity of a random forest is generally

O (q * T)

, where q is the number of nodes in the subtree and T is the number of subtrees. To sum up, the space complexity of a single-sample prediction is

O (q * T)

, which occupies a small space. The low space requirement facilitates the online deployment of WeDIGAR on edge computing nodes.

5. Experiments and Application

5.1. Experiment Setting

The experiment was conducted on a workstation equipped with an Intel i9-10900X CPU, 64 GB RAM, running Windows 11 64-bit OS, and utilizing the Python 3.6 programming environment. The dataset used in the experiment was collected by Guizhou BaishanCloud Technology Co., Ltd. (Guizhou, China) during the operation of their Web Application Firewall (WAF). It consists of 9114 webshell samples and 241,207 normal samples.

For the training set, we randomly selected 4000 webshell attack samples and 100,000 normal samples. The remaining samples were allocated as the test set. The test set was further divided equally into

d a t a s e t_{1}

and

d a t a s e t_{2}

, and the model was evaluated separately on both datasets.

To ensure a fair and reproducible evaluation, the WeDIGAR framework utilized the following hyper-parameter settings, which were optimized using 5-fold cross-validation on the training set:

Random Forest: Number of trees (

N_{t r e e}

) = 100, maximum depth (

D_{m a x}

) = none (allowing for full growth).

NOTEARS: Regularization parameter (

λ

) = 0.01 (optimized for sparsity).

Structural IG Filtering: Minimum frequency threshold = 5 samples, maximum frequency threshold = 99

Semantic IG Grouping (k): The total number of high-frequency sensitive words (m) was aggregated into

k = 50

groups. This value was empirically selected as it provided the optimal trade-off between dimensionality reduction and the model’s F1-score performance, significantly reducing the feature space from over m features to 50, which is critical for light-weighted deployment on edge nodes.

Due to the stochastic nature of machine learning methods, each method was tested 10 times, and the average performance was calculated. Additionally, the deviation value was computed based on these results. Since many of the methods discussed in the related work cannot be directly applied to HTML response body analysis, most of the methods compared in this section were designed by our own team. A few methods were sourced from a pending update module in a cloud service provider’s WAF.

5.2. Metrics

In binary classification problems, there are four indices to consider, namely True Positives (TPs), True Negatives (TNs), False Positives (FPs), and False Negatives (FNs). Based on these four indices, the metrics of recall, precision, and accuracy are defined in Equation (8) [27].

\begin{matrix} P r e c i s i o n & = \frac{TP}{TP + FP} \\ R e c a l l & = \frac{TP}{TP + FN} \\ A c c u r a c y & = \frac{TP + TN}{TP + FP + TN + FN} \end{matrix}

(8)

P r e c i s i o n

describes the accuracy of the model in identifying true positive samples, while

r e c a l l

represents the model’s ability to identify as many positive samples as possible. Because

p r e c i s i o n

and

r e c a l l

are negatively correlated metrics and accuracy favors the major class in imbalanced classification,

F_{1}

score (defined in Equation (9)) is a more balanced metric for evaluation [27]:

F_{1} = 2 * \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(9)

5.3. Comparison of Different Causal Analysis Methods

In this section, we compare different approaches to causal analysis. In addition to the NOTEARS algorithm included in WeDIGAR, there is the algorithm mentioned in [28], which is a generalization of NOTEARS on nonlinearity, and thus abbreviated as

{NOTEARS}_{nolinear}

. There is also the classic Peter–Clark (PC) [24] algorithm in causal analysis. We also run an ablated WeDIGAR model (without a causal analysis component) in the experiment.

Before the experiment, we can observe the causal graphs generated by the different algorithms (as shown in Figure 6). It can be seen that the graph generated by PC is obviously sparser than the graph generated by NOTEARS. In addition, the causal graph generated by the

{NOTEARS}_{nolinear}

algorithm has the least number of blue points.

Table 2 presents the accuracy and

F_{1}

score of various causal analysis methods used on the dataset. It can be observed that there is not a significant difference in terms of accuracy among the methods. WeDIGAR and RF achieve the same accuracy, but WeDIGAR exhibits slightly higher stability. Meanwhile, WeDIGAR demonstrates an improved

F_{1}

score compared to RF, indicating that causal analysis has a positive impact on the classification results. This suggests that incorporating causal analysis techniques can enhance the overall performance of the model.

5.4. Comparison with Baseline Methods

We compare the performance of WeDIGAR against several established and state-of-the-art webshell detection methods from the literature. These baselines represent diverse approaches to feature extraction and modeling, allowing us to validate the efficacy and light-weighted advantage of our Information Granules (IGs) and causal graph approach. We adapted these models to operate on the HTTP response body to ensure a fair comparison within our target scenario.

TF-IDF + Support Vector Machine (SVM): This classic approach represents general, high-accuracy text classification [27]. We extract standard Term Frequency–Inverse Document Frequency (TF-IDF) vectors from the full text of the response body, and the resulting high-dimensional vector is classified using an SVM. This serves as a benchmark for traditional semantic analysis.
Convolutional Neural Network (CNN): Representing advanced deep learning approaches [16], we implemented a 1D-CNN architecture commonly used for text classification. The input tokens are first mapped to embeddings, which the CNN then uses to automatically extract high-level semantic features. This model tests the hypothesis that deep learning feature engineering can outperform our targeted IGs.
Ensemble Model (Traffic Analysis Adapted): Based on the hybrid approaches used in high-traffic environments [8], we adapted a widely used model that combines statistical features (e.g., word count entropy, average word length) and basic content matching. This baseline assesses the value of deep linguistic features compared to broad statistical traffic metadata.

The performance comparison of WeDIGAR against these baselines is presented in Table 3.

Table 3 clearly validates the superior performance of the proposed WeDIGAR framework against all comparative baselines, consistently achieving the highest accuracy (

99.95 %

) and

F_{1}

score (

99.35 %

) across both datasets. This performance significantly surpasses the comparison models—including the statistical benchmark (TF-IDF + SVM) and the deep learning approach (CNN), which reached a maximum

F_{1}

score of approximately

98.35 %

. This superior efficacy is attributed to the combined strength of structural and semantic Information Granules (IGs) and the NOTEARS causal graph selection, which effectively captures precise malicious patterns and eliminates spurious correlations common in raw HTML data. Critically, WeDIGAR achieves the optimal balance required for light-weighted edge node deployment. While the high-accuracy ensemble models incur significant latency (≈

10 ms

to

14 ms

), and the fastest model (TF-IDF + SVM) has a significantly lower

F_{1}

score that compromises security, WeDIGAR maintains its leading performance profile while operating with a rapid inference time of

4.80 \pm 1.00 ms

.

5.5. Comparison of Different Methods with Same $F_{f i n a l}$

We compare different classification methods using the identical, causally selected feature set,

F_{f i n a l}

. These methods include classical machine learning algorithms such as KNN and SVMs, as well as Graph Convolutional Network (GCN) variants. Since GCNs require an input feature matrix

X

(which is

D_{f i n a l}

in our case) and an adjacency matrix

A

representing the structure, we test different methods for generating

A

from

F_{f i n a l}

. The general GCN architecture used is based on [29].

We define the GCN baselines based on their graph construction method:

${GCN}_{euclid}$ [29]: The adjacency matrix $A$ is constructed based on the Euclidean distance (similarity) between the samples in $D_{f i n a l}$ . Specifically, we use the k-nearest neighbors ( $k = 5$ ) graph construction method, which is a common practice for GCNs when a graph structure is not explicitly defined. This variant relies on inherent feature similarity.
${GCN}_{linear}$ [23,29]: The adjacency matrix $A$ is derived by treating the features in $F_{f i n a l}$ as nodes and applying the NOTEARS linear DAG structure learning algorithm [23] to construct the graph. This allows the GCN to leverage the causal structure inherent in the features, directly testing whether the graph structure learned by the linear causal model improves classification performance.
${GCN}_{nolinear}$ [28,29]: Similar to ${GCN}_{linear}$ , the adjacency matrix $A$ is derived from the Learning Sparse Nonparametric DAGs algorithm [28], which models nonlinear causal relationships between the features in $F_{f i n a l}$ .

To make the distinction clear, we notate them as

{GCN}_{euclid}

,

{GCN}_{linear}

, and

{GCN}_{nolinear}

. The three GCN methods were also integrated as part of the comparison, and the integrated method was noted as

{GCN}_{ensemble}

. The experimental results are shown in Table 4.

As clearly demonstrated in Table 4, WeDIGAR achieves a superior accuracy and

F_{1}

score compared to all other methods. While the SVM offers fast inference (

3.75 ms

per sample), its

F_{1}

score of approximately

96 %

falls short of the stringent accuracy requirements (>

99 %

) essential for cloud service providers. On the other hand, GCN-based approaches (

{GCN}_{euclid}

,

{GCN}_{linear}

,

{GCN}_{nolinear}

) suffer from prolonged inference times due to their model complexity and graph construction overhead, without delivering compensatory gains in detection performance. WeDIGAR effectively bridges this gap by combining a lightweight random forest classifier with high-quality features refined through causal selection, thereby achieving an optimal balance between precision and efficiency—this constitutes its core advantage for deployment in high-throughput edge computing nodes.

5.6. Compare with Our Previous Method

In this part, we compare WeDIGAR with the method used by Baishan Co., Ltd. and the historical version in the WeDIGAR research process. The comparison results are shown in Table 5. It can be clearly seen that WeDIGAR has substantially improved its performance compared to the previous methods in terms of both accuracy,

F_{1}

score, and prediction time.

We considered two approaches for our analysis. The first approach involves treating the HTML response body as plain text, without considering the implicit structural information granule. In this method, positive samples are extracted using the TF-IDF algorithm, and a random forest is utilized for discrimination. However, this approach overlooks the potential contained in negative samples.

The results in Table 5 clearly demonstrate three evolutionary stages of our methodology:

TF-IDF + RF: This approach treats the HTML response body as plain text, completely ignoring its semi-structured nature. Consequently, it fails to distinguish between identical keywords appearing in <script> tags versus <div> tags, resulting in significant feature noise. This represents the primary reason for its lowest

F_{1}

score (∼86.9%).

PC + RF: This method introduces structural Information Granules (IGs) and causal filtering, achieving a significant performance leap (

F_{1}

score∼96.8%). This proves that analyzing the HTML tag structure is crucial for distinguishing between normal and malicious responses. However, it entirely discards semantic information, unable to capture key textual features such as specific commands or function names commonly found in webshells.

WeDIGAR: Our final framework achieves a second performance leap by integrating both structural and semantic IGs. The NOTEARS algorithm acts as an “intelligent scheduler” here, causally selecting the optimal feature subset from the fused high-dimensional features that contains both important structural context and discriminative semantic content. This synergistic effect is the fundamental reason for why WeDIGAR achieves an

F_{1}

score of 99.35% while simultaneously reducing the prediction time to 4.8 ms.

5.7. Parameter Sensitivity Analysis

A robust and practical model should not be overly dependent on a specific set of “magic” parameters. Therefore, we conducted a sensitivity analysis on the core hyperparameters of WeDIGAR to validate its stability under different configurations, which is crucial for its deployment in real industrial environments.

Given the stochastic nature of machine learning, we performed 10 independent experiments for each parameter setting and calculated the average and standard deviation of the

F_{1}

score. This approach provides a more robust evaluation of parameter sensitivity than a direct comparison of accuracy and

F_{1}

score alone, as it effectively mitigates the influence of randomness.

As shown in Figure 7, WeDIGAR maintains excellent

F_{1}

scores (>99%) across a wide range of parameter values. Specifically, the following is determined:

The performance shows minimal fluctuation when the number of subtrees varies between 10 and 50 (Figure 7a), indicating low sensitivity to this parameter and ease of tuning.
Variations in the number of features (Figure 7b) and maximum depth (Figure 7c) have limited impact on performance, demonstrating that the feature set $F_{f i n a l}$ selected by NOTEARS is of high quality and that the model is resistant to overfitting.

This strong robustness to hyperparameters significantly reduces the deployment and maintenance threshold for WeDIGAR. Operators can achieve stable and superior performance without extensive parameter search, further demonstrating the practical value of our framework.

Furthermore, as parameters change, the prediction time may vary, which should be considered when selecting optimal parameters. The configuration chosen for our experiments is “number of subtrees: 20, number of features in subtrees: 10, maximum depth: none, positive–negative sample ratio: 25”.

5.8. Beta Testing

After validating the model in our lab, we proceeded to conduct beta testing. The accumulated time and average time taken to predict samples online within a timeframe of 800 s is depicted in Figure 8.

The second approach focuses solely on the structural information granule of the HTML response body. By matching the relevant features through regular expressions, we apply the PC algorithm to conduct causal analysis. Finally, a random forest is used for classification. This approach takes into account the structural information granule and incorporates causal analysis, which can potentially enhance the accuracy of the model.

WeDIGAR combines the strengths of the historical two methods. As a result, it achieves significant enhancements in terms of accuracy and

F_{1}

score compared to other approaches. Additionally, WeDIGAR optimizes the prediction time, resulting in faster processing and improved efficiency.

Based on the observations made in Figure 8, we note that, for a dataset consisting of 100,000 samples, the average prediction time is approximately 5.5 ms. This differs from the results obtained during the experimental setup but aligns with our analysis of the time complexity of the WeDIGAR algorithm as discussed in Section 4.4.1. The prediction time of the algorithm is influenced by the length of the HTML response body text. Larger response bodies tend to result in slower processing times. However, as the number of samples increases, the average prediction time gradually stabilizes and fluctuates within a reasonable range (within 10 ms).

During beta testing, WeDIGAR processed a total of 360,000 data points. Among these, it accurately detected 44 instances of webshell attacks, among which 4 were checked out to be false-positive detections.

After experimental validation and beta testing on the superior accuracy and efficiency of WeDIGAR, it has been put into industrial operation. The actual running screenshot of WeDIGAR is shown in Figure 9.

6. Conclusions

This paper introduced WeDIGAR, a novel, lightweight webshell detection framework tailored for resource-constrained satellite and UAV edge nodes. Our approach extracts structural and semantic Information Granules (IGs), refines these features using a causal graph (NOTEARS) to eliminate spurious correlations, and employs a random forest for efficient classification. The crucial benefit of this methodology is its prediction efficiency: by selecting only causally relevant features, the model size is minimized, allowing for low inference latency (e.g., ⩽

10 ms

) necessary for real-time operation on single edge nodes. This lightweight nature validates the efficacy of WeDIGAR and enables its planned integration into large-scale network optimization frameworks. Future deployment across massive satellite and UAV networks will involve addressing multi-objective task scheduling using sophisticated tools, such as those leveraging bilevel evolutionary algorithms [30], to manage resources efficiently across thousands of multiagile nodes.

Despite its efficiency, WeDIGAR’s current limitation lies in its inability to detect webshells that use end-to-end encryption or those heavily reliant on custom obfuscation that completely transforms the standard HTTP response body content. Because the model relies on structural and semantic patterns within the cleartext HTTP response, its effectiveness is compromised when these patterns are fully masked. Addressing this critical security boundary—specifically developing methods robust to encrypted files and advanced obfuscation techniques—will be the primary focus of our future algorithmic work.

Author Contributions

Conceptualization, S.F. and J.X.; Methodology, H.L.; Software, P.Z.; Validation, S.F. and H.L.; Formal Analysis, J.T. and J.Y.; Investigation, S.F. and H.L.; Resources, J.X.; Data Curation, H.L.; Writing—Original Draft Preparation, S.F.; Writing—Review and Editing, J.X. and P.Z.; Visualization, J.T.; Supervision, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by Doctoral Fund of Chongqing Industry Polytechnic College (No. 2023GZYBSZK3-03), the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202503212, KJQN202303203), the National Natural Science Foundation of China under grants 61966005, 62366008, and 61936001, the National Key Research and Development Program of China under grant 2020YFB1713300, the Natural Science Foundation of Chongqing (cstc2019jcyjcxttX0002, cstc2021ycjh-bgzxm0013), and the Key Collaboration Project of Chongqing Municipal Education Commission (HZ2021008).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. For further inquiries, please contact the corresponding author.

Conflicts of Interest

Author Jian Tong were employed by the company Guizhou BaishanCloud Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UAV	Unmanned Aerial Vehicle
XSS	Cross-Site Script Attack
SQL	Structured Query Language
CDN	Content Delivery Network
HTTP	Hypertext Transfer Protocol
HTML	Hypertext Markup Language
IG	Information Granule
NOTEARS	Non-combinatorial Optimization via Trace Exponential and Augmented Lagrangian for Structure learning
RF	Random Forest
DAG	Directed Acyclic Graph
SEM	Structural Equation Model
L-BFGS	Limited-memory Broyden–Fletcher–Goldfarb–Shanno
CNN	Convolutional Neural Network
LSTM	Long Short-Term Memory
GRU	Gated Recurrent Unit
SVM	Support Vector Machine
TF-IDF	Term Frequency times Inverse Document Frequency
IoT	Internet of Things
WTA	Webshell Taint Analysis
RF-GBDT	Random Forest–Gradient Boosting Decision Tree
PC	Peter–Clark Algorithm
REM	Regular Expression Matching
RE	Regular Expression
JSP	Java Server Pages
ASPX	Active Server Pages eXtended
ASP	Active Server Pages
HMMs	Hidden Markov Models
TP	True Positive
TN	True Negative
FP	False Positive
FN	False Negative

References

Hannousse, A.; Yahiouche, S. Handling webshell attacks: A systematic mapping and survey. Comput. Secur. 2021, 108, 102366. [Google Scholar] [CrossRef]
Ferrag, M.A.; Maglaras, L.; Moschoyiannis, S.; Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. J. Inf. Secur. Appl. 2020, 50, 102419. [Google Scholar] [CrossRef]
Tu, T.D.; Cheng, G.; Guo, X.; Pan, W. Webshell detection techniques in web applications. In Proceedings of the Fifth International Conference on Computing, Communications and Networking Technologies (ICCCNT), Hefei, China, 11–13 July 2014; pp. 1–7. [Google Scholar]
Fang, Y.; Qiu, Y.; Liu, L.; Huang, C. Detecting webshell based on random forest with fasttext. In Proceedings of the 2018 International Conference on Computing and Artificial Intelligence, Chengdu, China, 12–14 March 2018; pp. 52–56. [Google Scholar]
Tian, Y.; Wang, J.; Zhou, Z.; Zhou, S. CNN-webshell: Malicious web shell detection with convolutional neural network. In Proceedings of the 2017 VI International Conference on Network, Communication and Computing, Kunming, China, 8–10 December 2017; pp. 75–79. [Google Scholar]
Wang, Z.; Yang, J.; Dai, M.; Xu, R.; Liang, X. A method of detecting webshell based on multi-layer perception. Acad. J. Comput. Inf. Sci. 2019, 2, 81–91. [Google Scholar]
Shukla, A.K.; Srivastav, S.; Kumar, S.; Muhuri, P.K. UInDeSI4.0: An efficient Unsupervised Intrusion Detection System for network traffic flow in Industry 4.0 ecosystem. Eng. Appl. Artif. Intell. 2023, 120, 105848. [Google Scholar] [CrossRef]
Guo, Y.; Marco-Gisbert, H.; Keir, P. Mitigating webshell attacks through machine learning techniques. Future Internet 2020, 12, 12. [Google Scholar] [CrossRef]
Cui, H.; Huang, D.; Fang, Y.; Liu, L.; Huang, C. Webshell detection based on random forest–gradient boosting decision tree algorithm. In Proceedings of the 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, China, 18–21 June 2018; pp. 153–160. [Google Scholar]
Yang, W.; Sun, B.; Cui, B. A webshell detection technology based on HTTP traffic analysis. In Proceedings of the International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, Matsue, Japan, 4–6 July 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 336–342. [Google Scholar]
Zhang, H.; Guan, H.; Yan, H.; Li, W.; Yu, Y.; Zhou, H.; Zeng, X. Webshell traffic detection with character-level features based on deep learning. IEEE Access 2018, 6, 75268–75277. [Google Scholar] [CrossRef]
Deng, L.Y.; Lee, D.L.; Chen, Y.H.; Yann, L.X. Lexical analysis for the webshell attacks. In Proceedings of the 2016 International Symposium on Computer, Consumer and Control (IS3C), Xi’an, China, 4–6 July 2016; pp. 579–582. [Google Scholar]
Zhao, J.; Lu, Y.; Wang, X.; Zhu, K.; Yu, L. WTA: A static taint analysis framework for PHP webshell. Appl. Sci. 2021, 11, 7763. [Google Scholar] [CrossRef]
Sun, X.; Lu, X.; Dai, H. A matrix decomposition based webshell detection method. In Proceedings of the 2017 International Conference on Cryptography, Security and Privacy, Wuhan, China, 17–19 March 2017; pp. 66–70. [Google Scholar]
Nguyen, N.H.; Le, V.H.; Phung, V.O.; Du, P.H. Toward a deep learning approach for detecting php webshell. In Proceedings of the Tenth International Symposium on Information and Communication Technology, Hanoi, Vietnam, 4–6 December 2019; pp. 514–521. [Google Scholar]
Li, T.; Ren, C.; Fu, Y.; Xu, J.; Guo, J.; Chen, X. Webshell detection based on the word attention mechanism. IEEE Access 2019, 7, 185140–185147. [Google Scholar] [CrossRef]
Liu, Z.; Li, D.; Wei, L. A new method for webshell detection based on bidirectional gru and attention mechanism. Secur. Commun. Netw. 2022, 2022, 3434920. [Google Scholar] [CrossRef]
Ai, Z.; Luktarhan, N.; Zhao, Y.; Tang, C. Ws-lsmr: Malicious webshell detection algorithm based on ensemble learning. IEEE Access 2020, 8, 75785–75797. [Google Scholar] [CrossRef]
Yong, B.; Wei, W.; Li, K.C.; Shen, J.; Zhou, Q.; Wozniak, M.; Połap, D.; Damaševičius, R. Ensemble machine learning approaches for webshell detection in Internet of things environments. Trans. Emerg. Telecommun. Technol. 2022, 33, e4085. [Google Scholar] [CrossRef]
Stranieri, A.; Zeleznikow, J. WebShell: The development of web based expert systems. In Proceedings of the Research and Development in Intelligent Systems XVIII: Proceedings of ES2001, the Twenty-First SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence, Cambridge, UK, 10–12 December 2001; Springer: Berlin/Heidelberg, Germany, 2002; pp. 245–258. [Google Scholar]
Wu, Y.; Sun, Y.; Huang, C.; Jia, P.; Liu, L. Session-based webshell detection using machine learning in web logs. Secur. Commun. Netw. 2019, 2019, 3093809. [Google Scholar] [CrossRef]
Di Mauro, M.; Galatro, G.; Fortino, G.; Liotta, A. Supervised feature selection techniques in network intrusion detection: A critical review. Eng. Appl. Artif. Intell. 2021, 101, 104216. [Google Scholar] [CrossRef]
Zheng, X.; Aragam, B.; Ravikumar, P.; Xing, E.P. DAGs with NO TEARS: Continuous optimization for structure learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 9492–9503. [Google Scholar]
Kalisch, M.; Bühlman, P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 2007, 8, 613–636. [Google Scholar]
Byrd, R.H.; Lu, P.; Nocedal, J.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
Leskovec, J.; Rajaraman, A.; Ullman, J.D. Data mining. In Mining of Massive Datasets; Cambridge University Press: Cambridge, UK, 2014; pp. 1–19. [Google Scholar]
Zheng, X.; Dan, C.; Aragam, B.; Ravikumar, P.; Xing, E. Learning sparse nonparametric dags. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Online, 26–28 August 2020; pp. 3414–3425. [Google Scholar]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3844–3852. [Google Scholar]
Yao, F.; Chen, Y.; Wang, L.; Chang, Z.; Huang, P.Q.; Wang, Y. A Bilevel Evolutionary Algorithm for Large-Scale Multiobjective Task Scheduling in Multiagile Earth Observation Satellite Systems. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 3512–3524. [Google Scholar] [CrossRef]

Figure 1. The process of webshell attack. Attackers inject webshell scripts into the server through website vulnerabilities and build a mechanism for remote access on the server. This mechanism is generally called a “Backdoor”.

Figure 2. An example of a response body with webshell attack. Part (a) shows what the response body looks like in the browser, and part (b) shows the source code of the response body. It is important to note that the attacker has acquired the necessary permissions to access the database by previously injecting a webshell script, thereby granting unauthorized control over the system.

Figure 3. Model deployment environment. Our model detects webshell based on the response body generated by users’ request. If it identifies a webshell attack, an alarm will be triggered. The model will be deployed on CDN nodes.

Figure 4. Taxonomy of the technologies used for webshell detection.

Figure 5. System architecture of WeDIGAR. As indicated by the color of the rectangles, our main contribution lies in information granule construction. In the right part of constructing structural information granules, we use ‘A’–‘Z’ to name the information granules for simplicity (this does not mean that the number of information granules is exactly 26).

Figure 6. Causal graphs. (a) Causal graph generated by PC algorithm; (b) causal graph generated by NOTEARS; (c) causal graph generated by

{NOTEARS}_{nolinear}

. The red points are the label, the green points are the points connected to the label (points relevant to the outcome), and the blue points are the points not connected to the label (points irrelevant to the outcome).

Figure 6. Causal graphs. (a) Causal graph generated by PC algorithm; (b) causal graph generated by NOTEARS; (c) causal graph generated by

{NOTEARS}_{nolinear}

. The red points are the label, the green points are the points connected to the label (points relevant to the outcome), and the blue points are the points not connected to the label (points irrelevant to the outcome).

Figure 7. Parameter sensitivity analysis. (a) Effect of the number of subtrees; (b) effect of the number of features in subtrees; (c) effect of the maximum depth of subtrees; (d) effect of positive–negative sample ratio.

Figure 8. The time consumption of WeDIGAR in 800 s in the beta testing.

Figure 9. The information of detected suspicious responses is stored in a MongoDB database deployed on Servers of Baishan Co., Ltd.

Table 1. Comparative summary of webshell detection approaches.

Category	Core Methodology	Representative Works	Key Features	Limitations
Rule-based Source File Analysis	Malicious signature matching, static taint analysis.	[3,12,13]	Relies on manually defined rules; effective against known patterns.	Cannot detect obfuscated/encrypted webshells; high false-positive rate; rules require frequent updates.
ML-based Source File Analysis	Extracts features (e.g., TF-IDF, opcode sequences) and uses classifiers like RF, SVM.	[4,9,19]	Reduces reliance on manual rules; better generalization than pure rule-based methods.	Requires full source code access; feature engineering is complex; high computational cost.
Deep Learning Source File Analysis	Uses CNN, LSTM, GRU with attention to automatically learn features from source code.	[15,16,17]	End-to-end feature learning; robust against code obfuscation.	Demands large labeled datasets; high computational overhead; unsuitable for real-time detection on edge devices.
Traffic Analysis Methods	Monitors HTTP request/response patterns using CNN, LSTM, or SVM.	[5,10,11]	No need for source file access; can detect in-memory webshells.	Requires full traffic capture and session reconstruction; performance degrades with encrypted traffic.
Hybrid and Ensemble Methods	Combines rules, ML, and DL models to improve detection coverage and robustness.	[8,18,21]	High accuracy (up to 98%+); leverages strengths of multiple approaches.	High system complexity and resource consumption (e.g., >16 GB RAM); not feasible for resource-constrained environments.
Our WeDIGAR Framework	IGs + Causal Learning + RF. Analyzes only HTML response body using structural/semantic IGs, NOTEARS for feature selection, and a lightweight RF.	This work	No source file access needed; lightweight and fast (≤10 ms); high accuracy (≥99%) and suitable for edge nodes.	Currently cannot detect encrypted webshells (to be addressed in future work).

Table 2. Comparison of different causal analysis methods.

Method	Dataset 1		Dataset 2
Method	Accuracy (%)	F1 Score (%)	Accuracy (%)	F1 Score (%)
RF	99.95 ± 0.02	99.23 ± 0.20	99.95 ± 0.02	99.23 ± 0.10
PC + RF	99.94 ± 0.02	99.29 ± 0.15	99.94 ± 0.02	99.29 ± 0.15
Nonlinear + RF	99.93 ± 0.02	99.12 ± 0.20	99.94 ± 0.01	99.24 ± 0.10
WeDIGAR *	99.95 ± 0.01	99.35 ± 0.10	99.95 ± 0.01	99.35 ± 0.10

* Our proposed method.

Table 3. Overall performance comparison with external comparative baselines.

Method	Dataset 1			Dataset 2
Method	Accuracy (%)	F1 Score (%)	Time (ms)	Accuracy (%)	F1 Score (%)	Time (ms)
TF-IDF + SVM	99.51	97.42	4.15 ± 0.50	99.54	97.58	4.19 ± 0.30
CNN	99.65	98.11	14.22 ± 1.50	99.68	98.25	14.10 ± 1.30
Ensemble Model	99.70	98.35	10.50 ± 0.80	99.71	98.40	10.60 ± 0.70
WeDIGAR *	99.95 ± 0.01	99.35 ± 0.10	4.80 ± 1.00	99.95 ± 0.01	99.35 ± 0.10	4.80 ± 1.00

* Our proposed method.

Table 4. Comparison of different methods based on

F_{f i n a l}

.

Table 4. Comparison of different methods based on

F_{f i n a l}

.

Method	Dataset 1			Dataset 2
Method	Accuracy (%)	F1 Score (%)	Time (ms)	Accuracy (%)	F1 Score (%)	Time (ms)
SVM	99.75	96.46	3.74	99.76	96.62	3.75
KNN	99.75	96.51	38.48 ± 0.50	99.76	96.75	37.94 ± 1.00
GCN_euclid	99.40 ± 0.10	91.52 ± 0.60	15.57 ± 1.20	99.49 ± 0.05	92.57 ± 0.30	15.50 ± 1.00
GCN_linear	99.51 ± 0.10	92.97 ± 0.60	12.82 ± 0.90	99.52 ± 0.05	93.26 ± 0.50	12.81 ± 0.90
GCN_nonlinear	99.46 ± 0.10	92.07 ± 0.60	13.65 ± 1.20	99.49 ± 0.05	92.72 ± 0.10	13.66 ± 1.00
GCN_ensemble	99.53 ± 0.02	92.98 ± 0.30	43.02 ± 0.50	99.56 ± 0.05	93.60 ± 0.40	43.00 ± 0.50
WeDIGAR *	99.95 ± 0.01	99.35 ± 0.10	4.80 ± 1.00	99.95 ± 0.01	99.35 ± 0.10	4.80 ± 1.00

* Our proposed method.

Table 5. Comparison with our previous method.

Method	Dataset 1			Dataset 2
Method	Accuracy (%)	F1 Score (%)	Time (ms)	Accuracy (%)	F1 Score (%)	Time (ms)
TF-IDF + RF	99.00 ± 0.05	86.84 ± 0.50	11.88 ± 0.05	99.00 ± 0.05	86.96 ± 0.50	11.86 ± 0.05
PC + RF *	99.76 ± 0.08	96.77 ± 0.80	8.26 ± 0.03	99.76 ± 0.08	96.77 ± 1.10	8.26 ± 0.03
WeDIGAR **	99.95 ± 0.01	99.35 ± 0.10	4.80 ± 1.00	99.95 ± 0.01	99.35 ± 0.10	4.80 ± 1.00

* “PC + RF” in the table means PC + RF with no

F_{s e m}

; ** method proposed in this paper.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fu, S.; Li, H.; Zhu, P.; Tong, J.; Yang, J.; Xu, J. WeDIGAR: A Light-Weighted Webshell Detection Framework for Satellite and UAV Networks. Electronics 2025, 14, 4301. https://doi.org/10.3390/electronics14214301

AMA Style

Fu S, Li H, Zhu P, Tong J, Yang J, Xu J. WeDIGAR: A Light-Weighted Webshell Detection Framework for Satellite and UAV Networks. Electronics. 2025; 14(21):4301. https://doi.org/10.3390/electronics14214301

Chicago/Turabian Style

Fu, Shun, Hao Li, Panpan Zhu, Jian Tong, Jinye Yang, and Ji Xu. 2025. "WeDIGAR: A Light-Weighted Webshell Detection Framework for Satellite and UAV Networks" Electronics 14, no. 21: 4301. https://doi.org/10.3390/electronics14214301

APA Style

Fu, S., Li, H., Zhu, P., Tong, J., Yang, J., & Xu, J. (2025). WeDIGAR: A Light-Weighted Webshell Detection Framework for Satellite and UAV Networks. Electronics, 14(21), 4301. https://doi.org/10.3390/electronics14214301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

WeDIGAR: A Light-Weighted Webshell Detection Framework for Satellite and UAV Networks

Abstract

1. Introduction

1.1. Context and Challenges in Satellite/UAV Edge Networks

1.2. Limitations of Existing Webshell Detection Methods

1.3. Our Approach and Main Contributions

1.4. Paper Organization

2. Related Works

2.1. Based on Source File

2.2. Based on HTTP Traffic

3. Causal Learning with NOTEARS

4. WeDIGAR Model

4.1. Information Granule Construction

4.1.1. Structural Information Granules

4.1.2. Semantic Information Granules

4.2. Information Granule Selection

4.3. Algorithm Description of WeDIGAR

4.4. Time and Space Complexity

4.4.1. Time Complexity

4.4.2. Space Complexity

5. Experiments and Application

5.1. Experiment Setting

5.2. Metrics

5.3. Comparison of Different Causal Analysis Methods

5.4. Comparison with Baseline Methods

5.5. Comparison of Different Methods with Same F f i n a l

5.6. Compare with Our Previous Method

5.7. Parameter Sensitivity Analysis

5.8. Beta Testing

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.5. Comparison of Different Methods with Same $F_{f i n a l}$