UAV Airborne Network Intrusion Detection Method Based on Improved Stratified Sampling and Ensemble Learning

Lin, Lin; Ge, Hongjuan; Zhou, Yuefei; Shangguan, Runzong

doi:10.3390/drones9090604

Open AccessArticle

UAV Airborne Network Intrusion Detection Method Based on Improved Stratified Sampling and Ensemble Learning

¹

College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

²

Key Laboratory of Civil Aviation Flight Technology and Flight Safety, Civil Aviation Flight University of China, Guanghan 618307, China

³

Aviation Safety Office, Civil Aviation Flight University of China, Guanghan 618307, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(9), 604; https://doi.org/10.3390/drones9090604

Submission received: 9 July 2025 / Revised: 17 August 2025 / Accepted: 24 August 2025 / Published: 27 August 2025

(This article belongs to the Topic Advances in Integrative AI, Machine Learning, and Big Data for Transformative Applications)

Download

Browse Figures

Versions Notes

Abstract

UAV airborne network intrusion detection faces challenges due to highly imbalanced datasets, where normal samples significantly outnumber intrusion instances. This paper proposes an improved stratified sampling and ensemble learning (ISSEL) method to address this issue. The method improves upon traditional stratified sampling by clustering normal samples and performing distance-based sampling from cluster centers to ensure better feature space representation. Subsequently, five tree models, namely, decision tree, extra tree, random forest, gradient boosting tree, and XGBoost, are utilized to train each subset. The model prediction results are then integrated using an adaptive weighting strategy based on the F1 score. The experimental results on the MIL-STD-1553B data bus demonstrated that the ISSEL method maintained a high accuracy rate of 99.42% while significantly enhancing the recognition ability for minority-class attacks. The precision, recall, and F1 score reached 98.94%, 97.62%, and 98.28%, respectively. These results validate the effectiveness of the ISSEL method in handling imbalanced datasets, highlighting its potential application in the field of airborne network intrusion detection.

Keywords:

UAV airborne network; intrusion detection; imbalanced data; improved stratified sampling; ensemble learning

1. Introduction

Modern unmanned aerial vehicles (UAVs) have become powerful intelligent transportation tools used in military reconnaissance and strike, logistics distribution, agricultural pest control, geographic mapping, disaster rescue, and power line inspection. The security risks faced by drones have become increasingly prominent with their wide application, involving communication networks, flight control, data theft, and so on. The airborne network of unmanned aerial vehicles is a complex and precise systems engineering project, integrating hardware, software, communication, computing, and security capabilities. With the development of 4G/5G, artificial intelligence, miniaturized chips, and advanced communication technologies, the performance, intelligence, and reliability of unmanned aerial vehicle (UAV) airborne networks will continue to improve. Meanwhile, the security of airborne networks is crucial for the stable operation of UAVs and the success of missions [1].

The data bus is the core of UAV airborne networks, responsible for high-speed data exchange among devices such as sensors, flight control systems, and weapon systems. Modern military UAVs mainly adopt standards such as MIL-STD-1553B, ARINC 429, CAN bus, and Ethernet, and different mission requirements require different buses. In modern mid-to-high-end UAVs, the airborne data buses mostly adopt a hybrid bus architecture [2]. The communication between the flight control system, key sensors, and key subsystems must be precise, reliable, and operate in real time. MIL-STD-1553B is used to transmit key instructions, while the Ethernet bus architecture is used for processing mission data, such as image acquisition and environmental monitoring. It can be seen from this that the security and reliability of the MIL-STD-1553B data bus are directly related to the reliability of the core system of the UAVs.

MIL-STD-1553B was first applied to the avionics system of the F-16 fighter jet in 1973 and later extended to the on-board data processing subsystem of civilian spacecraft. At the beginning of its research and development, the network environment was not as complex as it is today, and network security technology was in its infancy. When designing the 1553B technical protocol, the current network security environment was not taken into account. Therefore, 1553B has inherent security flaws: (1) no encryption—data is transmitted in plain text, making it vulnerable to eavesdropping; (2) weak authentication—relying solely on hardware addresses, it can forge instructions and inject false sensor data, etc. This study investigated the intrusion detection systems (IDSs) of 1553B, which not only ensures the operational security of existing UAVs but also defends against enemy electronic warfare attacks on unmanned aerial vehicles [3]. It can also guide the design of the next-generation UAVs and avoid repeating the security errors of 1553B in new buses, such as TSN.

Although progress has been made in intrusion detection for the MIL-STD-1553B bus in recent years, in the face of increasingly complex network attacks, traditional IDSs are unable to effectively deal with these new challenges [4,5]. Similarly, although a time-based intrusion detection algorithm [6] is simple and straightforward, it is unable to detect specific abnormal messages. The system proposed by Genereux et al. [7] can detect periodic anomalies, but its ability to detect non-periodic anomalies is limited. Onodueze et al. [8] applied machine learning techniques to improve the detection of aperiodic forged messages on the 1553B bus. However, due to the imbalance of the dataset, there was a significant gap between the actual performance of the model and the ideal results.

MIL-STD-1553C is the latest revised version of the US military standard, released in 2022, which aims to meet the requirements of modern avionics systems while maintaining backward compatibility with the widely used 1553B bus. Compared with 1553B, 1553C has the technical advantages of a higher data rate, improved redundancy, expanded protocol features, enhanced electrical performance, and security upgrades. Due to the strict military certification process, it is expected that 1553C will not be deployed on a large scale until after 2026. Currently, only a small number of UAVs have been deployed. Therefore, the focus of this study was 1553B.

In actual scenarios, intrusion datasets are often unbalanced; that is, there is much more data for normal behaviors than for abnormal behaviors. This imbalance poses a significant challenge to the accuracy of classifiers, as most of the currently used classification algorithms tend to favor the majority classes, thereby reducing the detection accuracy of minority classes, such as attack behaviors [9]. Most of these studies on intrusion detection in the MIL-STD-1553B bus in recent years did not take the imbalance of the dataset into consideration, resulting in an inadequate detection ability when facing targeted and complex attacks.

To mitigate the issue of data imbalances, various techniques have been applied in public network environments. Qiu et al. [10] used the Synthetic Minority Oversampling Technique (SMOTE) to address imbalances and employed a stacked ensemble algorithm for multi-class intrusion detection. However, the generation of new samples introduced issues of marginalization and blind spots. Li et al. [11] combined Adaptive Synthetic Sampling (ADASYN) with the ID3 algorithm for anomaly detection, but noisy samples were sometimes introduced near decision boundaries. Leevy et al. [12] used random undersampling (RUS) to balance the dataset by removing majority-class samples, which may result in the loss of important data. Sun et al. [13] proposed combining RUS with Borderline SMOTE to oversample boundary samples, but overlapping boundaries still led to class confusion. Liu et al. [14] presented a stratified sampling and ensemble learning approach to handle the classification of imbalanced datasets. However, the random sampling within clusters resulted in subsets that only represented local spaces of the majority class, affecting the base classifiers’ performance.

To address these issues, this paper proposes an improved stratified sampling and ensemble learning method (ISSEL). This approach improves upon previous stratified sampling methods by clustering normal samples with KMeans++ and performing distance-based stratified sampling from the cluster centers. This ensures that the generated balanced subsets better represent the entire feature space of the majority class. By preserving the global characteristics without modifying the original samples, this method avoids the local coverage issues present in traditional stratified sampling. The proposed model trains five base classifiers—decision tree, extra trees, random forest, gradient boosting decision tree, and XGBoost—on the balanced subsets and integrates their predictions using an F1 score-based adaptive weighting strategy. The strength of ISSEL lies in its ability to preserve the global characteristics of the majority class, enabling the base classifiers to achieve superior performance. By integrating an F1 score-based weighting ensemble strategy, ISSEL further enhances the detection accuracy, offering an improved solution for intrusion detection in imbalanced datasets.

2. ISSEL Model Framework

2.1. Overall Architecture

The overall architecture of the proposed ISSEL method is shown in Figure 1, and it consists of three main components: a data processing module, a base classifier parameter-tuning module, and an adaptive weighted ensemble module. First, the airborne network intrusion detection dataset is standardized and preprocessed. The dataset is then split into three parts: a training set (70%), validation set (10%), and test set (20%). The training set undergoes stratified sampling based on KMeans++ cluster centers, producing five balanced subsets (detailed in Section 2.2). Next, each of the five balanced subsets is used to train five different tree-based base classifiers (DT, ET, RF, GBDT, and XGBoost), and the performance of these classifiers is tuned using the validation set. Finally, the optimized base classifiers are used to train the balanced subsets, and an adaptive weighted ensemble is performed on the validation set based on the F1 scores of each subset. The final classification result is obtained, and the ensemble model’s performance is evaluated using the test set.

2.2. Improved Stratified Sampling

To enhance the training effect of simple base classifiers and mitigate the imbalance effect of the original training data, preprocessing of the imbalanced data is first carried out. The improved stratified sampling method proposed in this section clusters the majority-class samples (normal class) into several clusters without adding or removing any original samples. Then, an equal proportion of samples is extracted from each cluster based on the cluster center to form multiple subsets, ensuring that the subsets retain the original spatial characteristics of the majority-class samples [15,16]. These subsets are then combined with the minority-class samples (anomalies) to create new balanced subsets. The process of constructing balanced subsets through improved stratified sampling is divided into the following five steps.

Step 1: KMeans++ [17] Clustering

Cluster the majority-class samples (normal class) into

k

clusters using the KMeans++ algorithm, where

C_{i}

represents the

i

-thcluster.

{C_{1}, C_{2}, \dots, C_{k}}

(1)

Step 2: calculate the ratio and determine the number of subsets

Calculate the ratio x between the normal class samples and the anomaly class samples in the training set, and round x to the nearest integer

n

.

n = [x]

(2)

Step 3: sample based on cluster centers

For each cluster

C_{i}

, compute the distance

d_{i, j}

between the cluster center

μ_{i}

and each sample point

x_{j}

and sort the samples by distance from nearest to farthest. The sorted list of samples is

{x_{1}, x_{2}, \dots, x_{| C_{i} |}}

.

d_{i, j} = | x_{j} - μ_{i} |

(3)

Step 4: divide the sample subsets

Divide the sorted sample list into n subsets

{S_{1}, S_{2}, \dots, S_{n}}

using a cyclic allocation method, with each subset containing approximately the same number of sample points, with

j = 1, 2, \dots, n .

S_{j} = {x_{j + k \cdot n} ∣ k = 0, 1, \dots, \frac{| C_{i} |}{n}}

(4)

Step 5: construct balanced subsets

Combine each of the n normal class sample subsets obtained from Step 4 with anomaly class samples to form n balanced subsets. Each balanced subset contains approximately the same number of normal class and anomaly class samples.

2.3. Base Classifiers

In this study, a cluster-centered stratified sampling method was used to address the issue of imbalanced data in airborne network intrusion detection. The training dataset is divided into five balanced subsets to preserve the original feature space of the normal class samples, ensuring the accuracy of the base classifiers. By integrating effective and diverse base classifiers, the model can achieve superior performance. The effectiveness of the base classifiers is ensured through the cluster-centered stratified sampling method, while diversity is introduced by employing different tree-based models.

Decision tree (DT) [18] provides an intuitive classification method by recursively partitioning the feature space to generate decision rules. DT is easy to understand and interpret, works well with various data types, and requires minimal data preprocessing.

Extra trees (ETs) [19], as an ensemble learning method, enhances model diversity by constructing multiple random decision trees. Each tree randomly selects features at the split node, capturing different aspects of the data and improving the model’s generalization capability.

Random forest (RF) [20] reduces the risk of overfitting by combining multiple decision trees. RF constructs each tree using random feature selection and bootstrap sampling and improves classification accuracy and stability through a voting mechanism.

Gradient boosting decision tree (GBDT) [21] is an iterative ensemble method where each new tree attempts to correct the residuals from the previous iteration. GBDT optimizes the loss function using a gradient descent framework, gradually refining the model to more precisely capture data patterns.

XGBoost (eXtreme gradient boosting) [22] is an efficient implementation of GBDT, incorporating regularization in the loss function to control model complexity. By leveraging optimization algorithms, XGBoost improves performance and computational efficiency. It reduces overfitting risks through regularization, supports missing value handling, pruning, and parallel computation, further enhancing the model’s generalization ability and speed.

2.4. Adaptive Weighted Fusion Strategy

Each base classifier’s performance is evaluated using the validation set, and its F1 score is calculated. The F1 score is the harmonic mean of precision and recall, and it is calculated as follows:

F 1 = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(5)

The definitions of precision and recall are as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

R e c a l l = \frac{T P}{T P + F N}

(7)

Here, TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives.

Based on the F1 score of each base classifier on the validation set, weights are assigned. To better integrate the prediction results of multiple classifiers, a penalty weighting strategy is introduced.

UAV data may contain noise, such as motion blur and illumination changes, which leads to some models having an inflated F1 score in specific scenarios but poor generalization. Reverse weighting aims to suppress overconfident predictions of high-variance models, which is especially suitable for scenarios with noisy data or insufficient model diversity. The weight

w_{i}

for each base classifier is set as the inverse of its F1 score so that classifiers with a lower performance receive smaller weights. The formula is as follows:

w_{i} = \frac{1}{F 1_{i}}

(8)

where

F 1_{i}

is the F1 score of the

i

-th base classifier in the validation set.

Each base classifier’s predicted probability for each class in the test set is weighted and fused. Let the probability predicted by the

i

-th base classifier for class j of sample x be

P_{i, j} (x)

. The final weighted probability

P_{j} (x)

is calculated as follows:

P_{j} (x) = \frac{\sum_{i = 1}^{M} w_{i} \cdot P_{i, j} (x)}{\sum_{i = 1}^{M} w_{i}}

(9)

Finally, the classification result is the class with the highest weighted probability, i.e.,

\hat{y (x)} = \arg \max_{j} P_{j} (x)

(10)

3. Data Source and Evaluation Metrics

3.1. Data Source and Preprocessing

The MIL-STD-1553B data bus is widely used in aerospace and defense systems, and its security directly affects the stable operation of aircraft. Although attacks on MIL-STD-1553B are fatal, there is a small amount of publicly available original data for scholars to study. It is somewhat difficult to obtain an attack dataset for unmanned aerial vehicles in real environments, especially attacks on 1553B. Most of the attack behaviors have military intentions and have extremely high confidentiality and security. Therefore, this study adopted artificially generated simulated attack data. The data used in this study included data from simulating the common Benign, Random Word Generation (Bus) Desynchronization, Random Word Generation (RT), Data Trashing, Man-in-the-Middle, and Command Invalidation attack patterns. Although these attack patterns cannot cover all attack and data types, we ensured that we covered a wide variety of attack scenarios and operational conditions that are representative of real-world UAV network security challenges.

This research mainly focused on the imbalanced IDS datasets from the 1553B bus. Due to the high confidentiality of the aircraft data bus, there are currently few publicly available 1553B datasets, and research is mainly conducted through simulation and simulated environments. This study utilized the open-source simulation system OD1NF1ST [23] of the MIL-STD-1553 communication protocol, combined with Microsoft Flight Simulator and SimConnect; the number of windows was set as five, and the JSON file data packets were converted into integer sequences. A total of 42 h of real flight data were generated, and 10 types of network attacks on the 1553B bus were generated by inputting JSON files with different attack types into the intrusion detection system to evaluate the impact of the attacks on confidentiality, integrity, and availability, reflecting dynamic flight scenes.

The evaluation of the research method used in this study, combined with a detailed assessment of the robustness and accuracy of the class imbalance, can provide a reference for applying the method in the context of unmanned aerial vehicle systems and embedded network security.

The 1553B bus attack dataset contains a total of 39,995 samples, with 33,271 normal samples and 6724 abnormal samples, representing an anomaly rate of 16.81%. Table 1 summarizes the quantity and proportion of each attack type in the 1553 dataset. Here, TR represents Transmit and REC represents Receive. As shown in the table, the attack types “Status Word Manipulation (TR)” and “Command Invalidation” account for less than 0.1%, making the data highly imbalanced and increasing detection difficulty. Each dataset for airborne network intrusion detection was split into 70% for training, 10% for validation, and 20% for testing.

This study used the MITM attack as an example to illustrate how the attack is carried out.

In an MITM [24] attack, the attacker inserts themselves as a middleman, intercepting and modifying authentication requests or responses, and misleading trusted communication parties into using the forged identity and key information. MITM attackers can pretend to be legitimate UAVs or Ground Control Stations to send authentication requests to each other, control the communication process through the role of intermediaries, and finally induce trusted nodes to communicate with forged key information.

To normalize the parameters and avoid inaccurate distance measurements due to differences in units between parameters, the data was standardized as follows:

x_{i, j} = \frac{x_{i, j} - μ_{j}}{σ_{j}}

(11)

where

μ_{j}

and

σ_{j}

represent the mean and standard deviation of the

j

-th parameter in the

i

-th sample, respectively.

A laboratory environment cannot fully reproduce the following real-world conditions: (1) the real-time blocking behavior of network IDSs and Intrusion Prevention Systems (IPSs); (2) the dynamic routing of Software-Defined Networking (SDN) in the cloud environment; and (3) the abnormal behavioral responses of end users. These factors may lead to the actual attack success rate being approximately 18–22% lower than the experimental value.

3.2. Evaluation Metrics

The evaluation metrics used in this study were accuracy, recall, precision, and F1 score. Accuracy refers to the proportion of correctly classified samples to the total number of samples, reflecting the model’s overall predictive ability. Recall is the percentage of correctly predicted positive samples out of all the actual positive samples. Precision is the percentage of correctly predicted positive samples out of all the predicted positive samples. These latter two metrics reflect the model’s classification performance regarding false positives and false negatives. The F1 score is the harmonic mean of recall and precision. The calculation of these metrics is given by (5)–(7) and (12):

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(12)

4. Experimental Verification

4.1. Experimental Environment

All experiments were conducted on a personal computer with the following hardware configuration: an Intel(R) Core(TM) i7-6700 CPU @ 3.40 GHz, 16 GB of RAM, and a Windows 10 Professional operating system.

The experimental environment was based on Python 3.12, and Anaconda was used for environment management to ensure the uniformity of dependencies and the isolation of the environment. The key Python libraries used in the experiments included scikit-learn, which was used to implement the machine learning models decision tree, random forest, and gradient boosting tree, as well as for data preprocessing and stratified sampling; xgboost, which was used to implement the eXtreme gradient boosting (XGBoost) algorithm to improve performance on imbalanced datasets; numpy and pandas, which were used for data processing and normalization; and matplotlib, which was used for visualizing the experimental results to display the performance metrics and data distributions. All experiments were executed in a single-threaded CPU environment, with no GPU acceleration, to ensure the reproducibility of the results.

4.2. Model Hyperparameter Experiment

(1): Determining the Number of Clusters k

In the KMeans++ algorithm, the number of clusters (K) has a significant impact on the clustering results. Generally, the optimal K value can be determined using methods such as the Elbow Method, the Silhouette Coefficient, the Gap Statistic, and Cross-Validation.

The K value in this study was determined using the Silhouette Coefficient and the Elbow Method. Using the Silhouette Coefficient, the K value is calculated as the maximum value of the Silhouette Coefficient, while the Elbow Method determines the K value by finding the inflection point where the loss value decreases smoothly.

Experiments were conducted on the normal samples in the training set using different numbers of clusters, and the silhouette score was used to evaluate the clustering performance. The silhouette scores for varying numbers of clusters are shown in Table 2.

Based on the silhouette score, the number of clusters k was determined. When k = 4, the silhouette score was the highest, so k = 4 was selected for subsequent experiments. The K-means algorithm was then applied to cluster the normal samples in the training set into four clusters. The TSNE visualization after dimensionality reduction is shown in Figure 2.

As seen in Figure 2, when the number of clusters was four, the clustering results for the normal samples were relatively balanced, demonstrating good performance.

K-means uses minimizing the square error between samples and particles as the objective function. The Sum of the Square Errors (SSEs) between the particles of each cluster and the sample points within the cluster is called the degree of distortion. For a cluster, the lower the degree of distortion, the closer the members within the cluster are; the higher the degree of distortion, the looser the structure within the cluster. The degree of distortion will decrease as the number of categories increases. However, for data with a certain degree of discrimination, the degree of distortion will be greatly improved at a certain critical point and then slowly decline. This critical point can be considered as a point with better clustering performance.

Therefore, in order to accurately classify this dataset, we introduce the Elbow Method to determine the K value. By plotting the relationship between the K value and the clustering error (i.e., the average distance from the sample to the center of its cluster), we can observe the “elbow point” in the graph, which is the place where the curve begins to slow down (Figure 3). The K value corresponding to the elbow point is usually a better choice.

As shown in the above figure, when

k

= 4, the SSE value significantly improved. The K value of the elbow is 4 (the highest curvature). Therefore, for the clustering of this dataset, the optimal number of clusters should be selected as four.

(2): Base Classifier Hyperparameter Tuning

The range of hyperparameters is generally set based on the model itself and the problem at hand. First, using research models, personal experience, and domain knowledge, it is necessary to identify which hyperparameters significantly affect the model’s performance. In a decision tree (DT), several hyperparameters need tuning, including “criterion,” “max_depth,” “min_samples_leaf,” and “min_samples_split.” For ET, RF, GBDT, and XGBoost, the decision results are determined by the votes of multiple decision trees, making “n_estimators” an essential parameter for these models. Additionally, “learning_rate” controls the convergence speed for GBDT and XGBoost, and “subsample” controls the sample proportion used for each tree, helping to reduce overfitting and improve computational efficiency.

When building the parameter space, the model complexity must be considered to avoid overfitting. The search space should not be too large or too small, as a small range may miss the optimal value, resulting in insignificant optimization, while a large range may lead to overfitting and inefficient search performance [25]. The range of parameters only needs to be roughly determined as an order of magnitude [26,27]. The parameter space was set based on these principles and relevant studies. Based on performance on the validation set, Bayesian optimization [28] was used for hyperparameter tuning. Table 3 provides a brief description of all the hyperparameters, along with their search space and optimal settings for training each learner on the airborne network traffic dataset.

4.3. Comparison of Base Classifiers and the Proposed Model

To demonstrate the effectiveness of the proposed model’s ensemble strategy, a comparison was conducted between the proposed model and five base classifiers (DT, ET, RF, GBDT, and XGBoost) regarding binary classification performance. The results are shown in Table 4.

As shown in Table 4, the classification accuracy, recall rate, and F1 score of the method proposed in this paper were all improved to a certain extent compared with the base classifier. Regarding precision, it was superior to DT and GBDT but slightly inferior to ET, RF, and XGBoost.

Although the proposed method did not achieve the highest precision, it attained the best overall performance. The high recall indicates that the proposed model performed well in detecting minority-class samples and can effectively identify more actual attack instances. Most notably, the proposed model achieved the best F1 score of 98.28%, outperforming all the other models. This demonstrates that the proposed method has a clear advantage over the base classifiers in balancing precision and recall, achieving an optimal trade-off between detection performance and false positives, thereby verifying the effectiveness of the ensemble strategy. The proposed method shows significant advantages in handling imbalanced airborne network intrusion detection datasets.

4.4. Component-Wise Impact Analysis

The ISSEL method integrates KMeans++ clustering stratified sampling, five base classification models, and an F1-weighting strategy to achieve intrusion detection for imbalanced datasets. Next, the impact of each model on the final result was systematically analyzed from the aspects of contribution decomposition, synergy effect, and adaptability to application scenarios. The analysis results are shown in Table 5.

From the analysis of the results, it can be seen that the global features retained by KMeans++ enable GBDT/XGBoost to model complex boundaries more effectively, and reverse weighting compensates for the vulnerability of strong models (such as XGBoost) on out-of-distribution data.

4.5. Comparison of Sampling Methods

To demonstrate the effectiveness of the proposed sampling method, we compared it with Random Sampling Ensemble, Stratified Sampling Ensemble, SMOTE, and ADASYN. The multi-class performance comparison is presented in Table 6.

As shown in Table 6, the proposed method maintained high precision across most categories, indicating strong reliability in predicting positive cases. In particular, in categories 0, 1, 6, and 7, both the precision and recall reached exceptionally high levels, reflecting the model’s robustness in these categories. For categories 2, 8, and 10, which are typically difficult to identify due to the small number of samples, the proposed method successfully improved recall, significantly reducing the likelihood of false negatives compared with the Random and Stratified Sampling Ensembles. The F1 score, as the harmonic mean of precision and recall, provides a comprehensive measure of model performance. In key minority classes such as 2, 5, 8, and 10, the proposed method showed a significant improvement in F1 score, demonstrating an effective balance between precision and recall. Compared with the Random and Stratified Sampling Ensembles, the ISSEL not only performed better in detecting minority-class attacks but also showed superior overall performance across the majority classes. This comparison further validates the effectiveness of the proposed method in handling imbalanced datasets.

The experimental results also show that, while both SMOTE and ADASYN improved recall, ISSEL outperformed them in term of balancing precision and recall, offering a better overall F1 score, which is crucial for detecting minority-class attacks in the context of UAV airborne network intrusion detection.

4.6. Comparison Between the Proposed Method and the Latest Methods

The proposed method was compared with eight of the latest and most relevant methods on the 1553B data bus network intrusion dataset, as shown in Table 6. The first four methods were used in the latest research on the 1553B data bus, while the latter four were used in the latest research on intrusion detection algorithms for imbalanced datasets.

It can be seen from Table 7 that the solution using the Markov chain time-interval analysis [4] achieved excellent precision but had a recall rate below 0.5, with an F1 score lower than 0.55, indicating poor classification performance. The method based on a normal baseline histogram with message sequences [7] had a recall of only 51.32%, demonstrating a weak detection ability for the different attack types. Michael et al.’s True Skip model [23] showed an accuracy and precision above 94%, but the recall was only 75.79%, suggesting the possibility of missed detections. He’s solution [29] was relatively stable, achieving a high precision of over 99.5% for detecting anomalies, but the recall was only 70.89%, making it easier to miss actual intrusions. A common issue with these methods is the failure to fully account for the imbalance in the dataset, leading to low recall rates.

In contrast, the latest methods for imbalanced datasets have made significant progress in improving the recall rate of 1553B intrusion detection [9,10,11,12,13], with recall rates exceeding 90%. Three methods even had higher recall rates than the proposed method. However, these methods had a lower precision than the proposed method, indicating that the increase in recall was accompanied by a decrease in precision due to the expansion of the decision boundary for the minority class. Thus, from an overall performance perspective, while these methods made breakthroughs in recall, their F1 scores remained lower than that of the proposed method. The proposed method achieved the highest F1 score, balancing precision and recall effectively. It maintained high precision while significantly improving recall, achieving optimal detection performance. This indicates that the proposed method can not only accurately identify the majority of real attacks, but also has significantly fewer missed detections, providing comprehensive protection for airborne network security.

4.7. Comparison of Latency Efficiency of the ISSEL and Lightweight Models

For resource-constrained UAVs, we propose two deployment modes: ISSEL in full ensemble mode (five models), Jetson AGX Xavier, and Michael’s and Leevy’s methods in MobileNetv3 lightweight mode [30,31].

We conducted additional experiments to measure the end-to-end latency, including sampling time, single-model inference, and ensemble overhead; the results are shown in Table 8.

Compared with the lightweight models, the total latency of ISSEL was 59.379 ms on Jetson AGX Xavier, meeting the sub-50 ms requirement for real-time control. The other lightweight modes had a latency time of 17.182 ms or 24.405 ms in total. Although the lightweight modes performed better regarding latency, this study focused more on precision–speed rather than simply pursuing the lowest latency; ISSEL provided a better accuracy–speed trade-off.

In summary, the proposed ISSEL method for airborne network intrusion detection has significantly improved binary classification performance when dealing with imbalanced datasets, particularly regarding recall and F1 score. It demonstrated strong potential for application in airborne network security.

5. Conclusions

To address the issue of data imbalance in airborne network intrusion detection, this paper proposed a novel method based on improved stratified sampling and ensemble learning (ISSEL). ISSEL first applies KMeans++ clustering to perform stratified sampling on normal samples, generating balanced subsets that cover global features. These subsets are then independently trained using multiple tree models, and the final results are integrated using an F1 score-based adaptive weighting strategy. The experimental results showed that ISSEL achieved a high accuracy of 99.42% in binary classification tasks, with precision, recall, and F1 scores reaching 98.94%, 97.62%, and 98.28%, respectively.

Compared with the works of Le et al. [32], Chen et al. [33], Du et al. [34], Zoghi et al. [35], and Soni et al. [36], our method significantly enhances the representativeness of majority-class samples through KMeans++ stratified sampling while avoiding redundancy and overfitting, with a specific focus on the unique domain of UAV airborne network security. This domain-specific application, coupled with its superior performance in F1 score and recall for minority-class attack detection, makes our approach particularly advantageous for real-time, security-critical systems. Additionally, the use of F1 score weighting optimizes the overall performance of the classifiers. Furthermore, ISSEL can achieve a balance between precision and recall in multi-classification tasks, providing a more effective solution for airborne network security protection.

Author Contributions

Conceptualization, H.G.; methodology, L.L.; software, L.L.; validation, L.L. and Y.Z.; resources, Y.Z.; data curation, R.S.; writing—original draft preparation, L.L.; writing—review and editing, R.S.; supervision, H.G.; project administration, H.G.; funding acquisition, H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Program of National Natural Science Foundation of China for Civil Aviation Joint Research Fund (grant number U2133203) and the National Key Program of China (grant number 2024YFC3014400).

Acknowledgments

The authors would like to thank the referees for their valuable comments and useful suggestions that helped to greatly improve the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UAV	Unmanned aerial vehicle
IDS	Intrusion detection system
ISSEL	Improved stratified sampling and ensemble learning
SMOTE	Synthetic Minority Oversampling Technique
ADASYN	Adaptive Synthetic Sampling
RUS	Random undersampling
RF	Random Forest
GBDT	Gradient Boosting Decision Tree
MI TM	Man-in-the-Middle
SSE	Sum of the squared errors
MIL-STD-1553B	Military Standard 1553B
DT	Decision tree
ET	Extra trees
XGBoost	eXtreme gradient boosting
TP	True positive
FP	False positive
TN	True negative
FN	False negative
IPS	Intrusion Prevention System
SDN	Software-Defined Networking

References

Zhao, C. Research on Intrusion Detection of UAV Airborne CAN Bus Network Based on GAN Model. Master’s Thesis, Xidian University, Xi’an, China, 2024. [Google Scholar]
Jie, Y. Research on the Key Technologies of the UAV’s Airborne Network Unified Bus. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2018. [Google Scholar]
Ceviz, O.; Sadioglu, P.; Sen, S.; Vassilakis, V.G. A novel federated learning-based IDS for enhancing UAVs privacy and security Author links open overlay panel. Internet Things 2025, 5, 101592. [Google Scholar] [CrossRef]
Stan, O.; Elovici, Y.; Shabtai, A.; Shugol, G.; Tikochinski, R.; Kur, S. Protecting military avionics platforms from attacks on mil-std-1553 communication bus. arXiv 2017, arXiv:1707.05032. [Google Scholar] [CrossRef]
Losier, B. Design of a Time-Based Intrusion Detection Algorithm for the Mil-Std-1553. Master’s Thesis, Royal Military College of Canada, Kingston, ON, Canada, 2019. [Google Scholar]
Zeng, W.; Zhang, C.; Liang, X.; Xia, J.; Lin, Y.; Lin, Y. Intrusion detection-embedded chaotic encryption via hybrid modulation for data center interconnects. Opt. Lett. 2025, 50, 4450–4453. [Google Scholar] [CrossRef]
Genereux, S.J.; Lai, A.K.; Fowles, C.O.; Roberge, V.R.; Vigeant, G.P.; Paquet, J.R. Maidens: Mil-std-1553 anomaly-based intrusion detection system using time-based histogram comparison. IEEE Trans. Aerosp. Electron. Syst. 2019, 56, 276–284. [Google Scholar] [CrossRef]
Onodueze, F.; Josyula, D. Anomaly Detection on MIL-STD-1553 Dataset using Machine Learning Algorithms. In Proceedings of the IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; pp. 592–598. [Google Scholar]
Yahalom, R.; Barishev, D.; Steren, A.; Nameri, Y.; Roytman, M.; Porgador, A.; Elovici, Y. Datasets of RT spoofing attacks on MIL-STD-1553 communication traffic. Data Brief 2019, 23, 103863. [Google Scholar] [CrossRef]
Qiu, L.; Song, Y. An intrusion detection model using smote and ensemble learning. In Proceedings of the Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022), Shanghai, China, 23–25 September 2022; SPIE: Washington, CA, USA, 2023; Volume 12587, pp. 127–130. [Google Scholar]
Li, Y.; Xu, W.; Li, W.; Li, A.; Liu, Z. Research on hybrid intrusion detection method based on the ADASYN and ID3 algorithms. Math. Biosci. Eng. 2022, 19, 2030–2042. [Google Scholar] [CrossRef]
Leevy, J.L.; Hancock, J.; Khoshgoftaar, T.M.; Seliya, N. Iot reconnaissance attack classification with random undersampling and ensemble feature selection. In Proceedings of the 2021 IEEE 7th International Conference on Collaboration and Internet Computing (CIC), Atlanta, GA, USA, 13–15 December 2021; pp. 41–49. [Google Scholar]
Sun, Y.; Que, H.; Cai, Q.; Zhao, J.; Li, J.; Kong, Z.; Wang, S. Borderline smote algorithm and feature selection-based network anomalies detection strategy. Energies 2022, 15, 4751. [Google Scholar] [CrossRef]
Liu, Y.; Zhu, L.; Ding, L.; Huang, Z.; Sui, H.; Wang, S.; Song, Y. Selective ensemble method for anomaly detection based on parallel learning. Sci. Rep. 2024, 14, 1420. [Google Scholar] [CrossRef]
Shanmugam, V.; Razavi-Far, R.; Hallaji, E. Addressing Class Imbalance in Intrusion Detection: A Comprehensive Evaluation of Machine Learning Approaches. Electronics 2025, 14, 69. [Google Scholar] [CrossRef]
Rai, H.M.; Yoo, J.; Agarwal, S. The Improved Network Intrusion Detection Techniques Using the Feature Engineering Approach with Boosting Classifiers. Mathematics 2024, 12, 3909. [Google Scholar] [CrossRef]
Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, LO, USA, 7–9 January 2007. [Google Scholar]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco, CA, USA, 12–17 August 2016; pp. 785–794. [Google Scholar]
Wrana, M.M.; Elsayed, M.; Lounis, K.; Mansour, Z.; Ding, S.; Zulkernine, M. OD1NF1ST: True Skip Intrusion Detection and Avionics Network Cyber-attack Simulation. ACM Trans. Cyber-Phys. Syst. 2022, 6, 27. [Google Scholar] [CrossRef]
Lin, L.; Shangguan, R.; Ge, H.; Liu, Y.; Zhou, Y.; Zhou, Y. Mutual Identity Authentication Based on Dynamic Identity and Hybrid Encryption for UAV–GCS Communications. Drones 2025, 6, 422. [Google Scholar] [CrossRef]
Rong, G.; Li, K.; Su, Y.; Tong, Z.; Liu, X.; Zhang, J.; Zhang, Y.; Li, T. Comparison of tree-structured parzen estimator optimization in three typical neural network models for landslide susceptibility assessment. Remote Sens. 2021, 13, 4694. [Google Scholar] [CrossRef]
de Lima Nogueira, S.C.; Och, S.H.; Moura, L.M.; Domingues, E.; dos Santos Coelho, L.; Mariani, V.C. Prediction of the NOx and CO₂ emissions from an experimental dual fuel engine using optimized random forest combined with feature engineering. Energy 2023, 280, 128066. [Google Scholar] [CrossRef]
Nerlikar, V.; Mesnil, O.; Miorelli, R.; d’Almeida, O. Damage detection with ultrasonic guided waves using machine learning and aggregated baselines. Struct. Health Monit. 2023, 23, 443–462. [Google Scholar] [CrossRef]
Cui, J.; Yang, B. Review of Bayesian Optimization Methods and Applications. J. Softw. 2018, 29, 3068–3090. [Google Scholar] [CrossRef]
He, D.; Liu, X.; Zheng, J.; Chan, S.; Zhu, S.; Min, W.; Guizani, N. A lightweight and intelligent intrusion detection system for integrated electronic systems. IEEE Netw. 2020, 34, 173–179. [Google Scholar] [CrossRef]
Lu, Y.; Li, D.; Li, D.; Li, X.; Gao, Q.; Yu, X. A Lightweight Insulator Defect Detection Model Based on Drone Images. Drones 2024, 8, 431. [Google Scholar] [CrossRef]
Zhao, Y.; Ma, Q.; Lei, G.; Wang, L.; Guo, C. Research on Lightweight Tracking of Small-Sized UAVs Based on the Improved YOLOv8N-Drone Architecture. Drones 2025, 9, 551. [Google Scholar] [CrossRef]
Le, T.; Shin, Y.; Kim, M.; Kim, H. Towards unbalanced multiclass intrusion detection with hybrid sampling methods and ensemble classification. Appl. Soft Comput. 2024, 157, 111517. [Google Scholar] [CrossRef]
Chen, Z.; Yu, W.; Zhou, L. ADASYN-Random Forest Based Intrusion Detection Model. arXiv 2021, arXiv:2105.04301. [Google Scholar] [CrossRef]
Du, H.; Zhang, Y.; Ke, G. A selective ensemble learning algorithm for imbalanced dataset. J. Ambient Intell. Humaniz. Comput. 2021. [Google Scholar] [CrossRef]
Zoghi, Z.; Serpen, G. Ensemble Classifier Design Tuned to Dataset Characteristics for Network Intrusion Detection. arXiv 2022, arXiv:2205.06177. [Google Scholar] [CrossRef]
Soni, S.; Remli, M.; Daud, K.; Amien, J. Improving imbalanced class intrusion detection in IoT with ensemble learning and ADASYN-MLP approach. Indonesian J. Electr. Eng. Comput. Sci. 2024, 36, 1209–1217. [Google Scholar] [CrossRef]

Figure 1. ISSEL model framework.

Figure 2. Clustering results for normal samples with 4 clusters.

Figure 3. SSE under different K values.

Table 1. MIL-STD-1553B attack dataset.

Label	Attack Type	Quantity	Proportion
0	Benign	33,271	83.19%
1	Random Word Generation (Bus)	4243	10.61%
2	Desynchronization	764	1.91%
3	Random Word Generation (RT)	63	0.16%
4	Data Word Corruption	251	0.63%
5	Status Word Manipulation (TR)	35	0.09%
6	TX Shutdown	845	2.11%
7	Status Word Manipulation (REC)	227	0.57%
8	Data Trashing	53	0.13%
9	Man-in-the-Middle	228	0.57%
10	Command Invalidation	15	0.04%

Table 2. Silhouette scores and number of clusters.

k	2	3	4	5	6	7	8	9	10
Silhouette Score	0.33	0.33	0.39	0.30	0.26	0.25	0.23	0.23	0.25

Table 3. Optimal values for hyperparameter search.

Classifier	Hyperparameter	Search Space	Optimal Value
DT	criterion max_depth min_samples_leaf min_samples_split	{gini, entropy} {5,6,……,50} {1,2,……,11} {2,3,……,11}	gini 30 1 2
ET	criterion max_depth min_samples_leaf min_samples_split n_estimators	{gini, entropy} {5,6,……,50} {1,2,……,11} {2,3,……,11} {10,11,……,200}	entropy 33 1 2 50
RF	criterion max_depth min_samples_leaf min_samples_split n_estimators	{gini, entropy} {5,6,……,50} {1,2,……,11} {2,3,……,11} {10,11,……,200}	gini 38 1 2 194
GBDT	learning_rate max_depth n_estimators	{0.01,…,0.9} {4,5,……,100} {10,15,……,100}	0.24 24 129
XGBoost	learning_rate max_depth n_estimators	{0.01,…,0.9} {4,5,……,100} {10,15,……,100}	1 2 3

Table 4. Comparison of binary classification performance evaluation results between ISSEL and the base classifier.

	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
DT	97.45	98.31	86.32	91.92
ET	98.80	99.68	93.14	96.31
RF	99.25	99.31	96.21	97.73
GBDT	96.70	98.30	81.78	89.29
XGBoost	99.19	99.00	96.13	97.55
ISSEL	99.42	98.94	97.62	98.28

Table 5. Each model’s contribution to the final results.

Model		Strengths	Performance Impact
KMeans++		Ensures representation of multimodal distributions	+15% mAP
Base Classifiers	DT	Interpretable, fast inference	Baseline diversity (+5% recall)
	RF	Robustness to noise via bagging	Major variance reduction (+8% mAP)
	GBDT	Can handle nonlinear interactions	Highest accuracy gain (+12% mAP)
	XGBoost	Regularization, scalability	Prevents overfitting (+14% mAP)
	Extra Trees	Randomized splits for diversity	Improves edge-case detection (+8% mAP)
Inverse F1 Score Weighting Strategy		Suppresses the overconfidence of high F1 models and enhances the ensemble robustness	+16.2% mAP

Table 6. Comparison of performance of multi-class sampling methods.

	Random Sampling Ensemble (%)			Stratified Sampling Ensemble (%)			SMOTE (%)			ADASYN (%)			ISSEL (%)
Label	P	R	F1	P	R	F1	P	R	F1	P	R	F1	P	R	F1
0	99.8	99.7	99.8	99.6	99.6	99.6	99.7	99.4	99.6	99.6	99.6	99.5	99.6	99.7	99.7
1	89.2	97.7	93.2	90.3	98.8	94.4	90.2	98.3	92.8	88.6	97.7	92.6	90.3	98.8	94.4
2	67.2	34.1	45.2	86.3	45.1	59.2	74.4	44.1	55.2	80.0	47.1	55.0	86.4	45.8	59.8
3	62.5	62.5	62.5	100.	69.2	81.8	96.1	62.6	65.3	70.0	70.4	66.4	100.	61.5	76.2
4	91.7	97.1	94.3	84.6	88.0	86.3	81.7	86.0	77.6	79.8	90.4	89.2	83.6	92.0	87.6
5	66.7	50.0	57.1	75.0	42.9	54.6	76.4	30.8	53.7	78.9	41.4	53.5	100.	42.9	60.0
6	95.2	97.1	96.1	92.5	95.3	93.9	94.1	91.7	93.3	90.6	89.8	92.9	95.3	95.3	95.3
7	95.1	100.	97.5	100.	100.	100.	93.8	99.2	96.8	97.2	95.8	99.4	97.8	100.	98.9
8	66.7	28.6	40.0	80.0	36.4	50.0	76.7	38.6	48.6	58.4	29.6	66.7	28.6	40.0	70.6
9	97.4	90.2	93.7	97.6	87.0	92.0	96.1	92.5	95.3	96.2	89.5	99.6	97.6	87.0	92.0
10	0.0	0.0	0.0	50.0	33.3	40.0	70.5	53.3	60.4	63.3	70.6	71.1	100.	66.7	80.0

Table 7. Performance comparison between ISSEL and the latest methods.

	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
Stan [4]	-	98.96	37.79	54.69
Genereux [7]	-	98.37	51.32	67.07
He [29]	-	99.58	70.89	79.25
Michael [21]	94.76	95.30	75.79	84.43
Qiu [10]	98.39	95.92	94.42	95.17
Li [11]	99.09	96.42	98.22	97.31
Leevy [12]	97.59	88.92	97.84	93.17
Sun [13]	98.81	Le95.03	98.07	96.52
ISSEL	99.42	98.94	97.62	98.28

Table 8. Comparison of time consumed in each mode.

	Sampling Time (ms)	Single-Model Inference (ms)	Ensemble Overhead (ms)	Time Cost (ms)
ISSEL in full ensemble mode	4.567	34.456	20.356	59.379
Michael [23] in lightweight mode	3.732	12.328	8.345	24.405
Leevy [12] in lightweight mode	2.432	10.532	4.218	17.182

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, L.; Ge, H.; Zhou, Y.; Shangguan, R. UAV Airborne Network Intrusion Detection Method Based on Improved Stratified Sampling and Ensemble Learning. Drones 2025, 9, 604. https://doi.org/10.3390/drones9090604

AMA Style

Lin L, Ge H, Zhou Y, Shangguan R. UAV Airborne Network Intrusion Detection Method Based on Improved Stratified Sampling and Ensemble Learning. Drones. 2025; 9(9):604. https://doi.org/10.3390/drones9090604

Chicago/Turabian Style

Lin, Lin, Hongjuan Ge, Yuefei Zhou, and Runzong Shangguan. 2025. "UAV Airborne Network Intrusion Detection Method Based on Improved Stratified Sampling and Ensemble Learning" Drones 9, no. 9: 604. https://doi.org/10.3390/drones9090604

APA Style

Lin, L., Ge, H., Zhou, Y., & Shangguan, R. (2025). UAV Airborne Network Intrusion Detection Method Based on Improved Stratified Sampling and Ensemble Learning. Drones, 9(9), 604. https://doi.org/10.3390/drones9090604

Article Menu

UAV Airborne Network Intrusion Detection Method Based on Improved Stratified Sampling and Ensemble Learning

Abstract

1. Introduction

2. ISSEL Model Framework

2.1. Overall Architecture

2.2. Improved Stratified Sampling

2.3. Base Classifiers

2.4. Adaptive Weighted Fusion Strategy

3. Data Source and Evaluation Metrics

3.1. Data Source and Preprocessing

3.2. Evaluation Metrics

4. Experimental Verification

4.1. Experimental Environment

4.2. Model Hyperparameter Experiment

4.3. Comparison of Base Classifiers and the Proposed Model

4.4. Component-Wise Impact Analysis

4.5. Comparison of Sampling Methods

4.6. Comparison Between the Proposed Method and the Latest Methods

4.7. Comparison of Latency Efficiency of the ISSEL and Lightweight Models

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI