UAV Cybersecurity with Mamba-KAN-Liquid Hybrid Model: Deep Learning-Based Real-Time Anomaly Detection

Batur Dinler, Özlem

doi:10.3390/drones9110806

Open AccessArticle

UAV Cybersecurity with Mamba-KAN-Liquid Hybrid Model: Deep Learning-Based Real-Time Anomaly Detection

by

Özlem Batur Dinler

Faculty of Engineering, Siirt University, Siirt 56100, Turkey

Drones 2025, 9(11), 806; https://doi.org/10.3390/drones9110806

Submission received: 28 October 2025 / Revised: 12 November 2025 / Accepted: 15 November 2025 / Published: 18 November 2025

(This article belongs to the Section Drone Communications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Novel Mamba-KAN-Liquid hybrid architecture achieves detection rates exceeding 95% across six UAV cyberattack types (GPS spoofing: 97.3%, jamming: 95.8%, MITM: 96.2%, sensor manipulation: 94.7%, DDoS: 98.1%, zero-day: 89.4%) with 47.3 ms inference latency, representing 4–5× speedup over transformer baselines.
The model demonstrates exceptional robustness with 89.4% zero-day attack detection and maintains 76.5% accuracy under severe noise conditions (σ = 0.20), outperforming state-of-the-art methods by 16–22 percentage points in adversarial scenarios.

What are the implications of the main findings?

Real-time cybersecurity deployment becomes viable on resource-constrained UAV platforms with only 2.5 M parameters requiring 96 MB memory, enabling onboard threat detection without reliance on ground station connectivity.
The architectural framework extends beyond UAV applications to other edge computing scenarios requiring real-time anomaly detection under computational constraints, with proven scalability to 50-UAV swarm monitoring while maintaining sub-100 ms latency.

Abstract

Unmanned Aerial Vehicles (UAVs) are increasingly being used in critical infrastructure, defense, and civilian applications, and face new cybersecurity threats. In this work, we present a novel hybrid deep learning architecture that combines Mamba, Kolmogorov-Arnold Networks (KAN), and Liquid Neural Networks for real-time cyberattack detection in UAV systems. The proposed Mamba-KAN-Liquid (MKL) model integrates Mamba’s selective state-space mechanism for temporal dependency modeling, KAN’s learnable activation functions for feature representation, and Liquid networks’ dynamic adaptation capabilities for real-time anomaly detection. Extensive evaluations on CIC-IDS2017, CSE-CIC-IDS2018, and synthetic UAV telemetry datasets demonstrate that our model achieves detection rates exceeding 95% across six different attack scenarios, including GPS spoofing (97.3%), network jamming (95.8%), man-in-the-middle attacks (96.2%), sensor manipulation (94.7%), DDoS (98.1%), and zero-day attacks (89.4%). The model meets real-time processing requirements with an average inference time of 47.3 ms for a sample batch size of 32, making it suitable for practical deployment on resource-constrained UAV platforms.

Keywords:

UAV security; cyber-attack detection; Mamba architecture; Kolmogorov-Arnold networks; Liquid neural networks; anomaly detection

1. Introduction

The proliferation of Unmanned Aerial Vehicles (UAVs) across civilian, commercial, and military domains has fundamentally transformed modern airspace operations. Global UAV market projections estimate growth from $27.4 billion in 2021 to $58.4 billion by 2026 [1], spanning applications in precision agriculture, infrastructure inspection, emergency response, and defense operations [2]. However, UAV systems’ inherent characteristics—wireless communication dependencies, GPS-based navigation, distributed sensor networks, and constrained computational resources—expose them to sophisticated cyber threats that traditional security paradigms inadequately address [3].

UAV architectures rely on multiple attack-vulnerable components: GNSS receivers susceptible to spoofing and jamming [4], wireless links vulnerable to eavesdropping and MITM attacks [5], sensors prone to manipulation [6], and ground control stations exposed to DoS attacks [7]. High-profile incidents, including the 2011 RQ-170 capture through GPS spoofing [8], 2016 commercial UAV vulnerabilities [9], and 2018 Gatwick Airport disruption [10], demonstrate that UAV cybersecurity constitutes immediate operational imperatives with significant economic, safety, and national security implications.

Traditional cybersecurity approaches prove inadequate for UAV ecosystems due to unique constraints: (1) dynamic FANET topologies [11], (2) sub-100 ms latency requirements [4], (3) limited onboard computation (10–50 vs. 1000+ GFLOPS) [3], (4) energy constraints impacting flight duration [6], and (5) continuously evolving threat landscapes [12]. These constraints necessitate novel approaches that balance detection accuracy, computational efficiency, and real-time responsiveness.

UAV telemetry streams constitute high-dimensional time series with complex temporal dependencies spanning multiple timescales. GPS spoofing manifests as gradual coordinate drift over minutes, while DoS attacks exhibit millisecond-scale bursts [13,14]. Effective threat detection requires modeling both short-term transient anomalies and long-range behavioral patterns. LSTM networks [15] suffer from vanishing gradients beyond 500 timesteps and quadratic training complexity [16]. GRUs [16] partially alleviate computational burdens but retain limitations in capturing ultra-long dependencies. Transformers [17], while theoretically capable of arbitrary-length modeling, incur prohibitive O(n²) complexity unsuitable for resource-constrained UAV platforms.

Modern UAV systems generate heterogeneous telemetry encompassing GPS coordinates, IMU data, communication metrics, flight control parameters, and environmental sensors [18], resulting in feature spaces with dimensionality d ∈ [80, 200]. Traditional neural networks with fixed activation functions struggle to capture complex nonlinear relationships. Recent work demonstrates that learnable activation functions provide superior representational capacity [19], yet their integration into temporal models for cybersecurity remains largely unexplored.

Cyber threat actors continuously evolve attack methodologies, inducing temporal distribution shifts (concept drift) in operational data [20]. Zero-day exploits exhibit behavioral patterns absent from training distributions, necessitating models with strong generalization capabilities [21]. Static models experience performance degradation as threat landscapes evolve [12]. Traditional approaches employ periodic retraining, incurring computational costs and temporal protection gaps [22]. Liquid Neural Networks [23] offer dynamic time-constant mechanisms enabling continuous adaptation without explicit retraining.

The MKL architecture addresses three orthogonal yet coupled challenges in UAV cybersecurity. First, UAV attacks exhibit multi-scale temporal dependencies spanning milliseconds (DDoS bursts) to minutes (GPS drift), requiring Mamba’s linear O(T) complexity for sequences exceeding 1000 timesteps. Second, heterogeneous sensor modalities (GPS, IMU, network traffic) demand KAN’s learnable B-spline activations for per-dimension adaptive nonlinearities. Third, continuously evolving attack methodologies necessitate Liquid Networks’ dynamic time constants for adaptation without retraining.

These challenges cannot be addressed by isolated components—temporal encoding without adaptive features fails on heterogeneous sensors (Mamba-only: 87.2%), while feature transformation without temporal context misses sequential patterns (KAN + Liquid: 85.4%). The MKL architecture achieves synergistic integration with multiplicative error reduction (

ε_{M K L} \leq ε_{M a m b a} \cdot α_{K A N} \cdot β_{L i q u i d}

), enabling zero-day detection of 89.4% compared to 73.2% baseline—a substantial 16.2-point improvement achievable only through integrated architecture.

The novelties and contributions of the present study are summarized below: The first hybrid deep learning model combining Mamba temporal coding, KAN feature representation, and Liquid adaptation for UAV cybersecurity; Comprehensive mathematical characterization of the hybrid model, including selective state-space dynamics, learnable activation functions, and dynamic time constants; Empirical validation on real-world cybersecurity datasets and UAV-specific attack scenarios, with comparative analysis using state-of-the-art methods; Optimization strategies and architectural design guidelines for practical application on resource-constrained UAV platforms.

The remainder of this paper is structured as follows: Section 2 reviews related work in UAV cybersecurity and temporal sequence modeling. Section 3 presents the comprehensive MKL methodology, including mathematical formulations of Mamba encoders, KAN architectures, and Liquid layers. Section 4 describes the proposed MKL framework and architectural components. Section 5 presents the experimental setup, datasets, and comprehensive performance evaluation results. Section 6 discusses the key findings, architectural contributions, limitations, and future research directions. Section 7 introduces the conclusion and future work on the paper.

2. Related Work

The evolution of UAV cybersecurity has progressed from traditional machine learning to sophisticated deep learning architectures, with recent emphasis on real-time temporal sequence modeling for resource-constrained platforms.

2.1. Traditional Machine Learning for UAV Security

Early UAV intrusion detection systems employed classical machine learning algorithms. Arthur et al. [18] developed lightweight intrusion detection using Random Forests, achieving 87.3% accuracy on GPS spoofing detection but suffering from limited temporal modeling and feature engineering overhead. Sedjelmaci et al. [24] proposed Support Vector Machine (SVM)-based anomaly detection for FANET communications, demonstrating 89.1% detection rates with 340 ms average latency—marginally acceptable for real-time requirements. Whelan et al. [25] provided standardized UAV attack datasets enabling comprehensive evaluation of intrusion detection approaches across diverse threat scenarios. While these classical approaches offer interpretability and computational efficiency, they struggle with high-dimensional feature spaces and complex temporal dependencies characteristic of modern UAV attack patterns.

2.2. Deep Learning Approaches for UAV Intrusion Detection

Recent research has transitioned toward deep learning methodologies addressing the limitations of classical techniques. Shafique et al. [26] implemented Convolutional Neural Networks (CNNs) for spatial pattern recognition in UAV network traffic, achieving 91.2% accuracy but neglecting temporal dependencies critical for sequential attack patterns. Khoei et al. [13] developed Bidirectional LSTM networks for GPS spoofing detection, improving accuracy to 93.4% but encountering 187 ms inference latency on Raspberry Pi 4—borderline for real-time deployment. Heidari et al. [27] proposed blockchain-based intrusion detection with radial basis function neural networks for Internet of Drones, attaining 96.32% accuracy on IoT datasets. However, such approaches remain computationally intensive for resource-constrained platforms, including UAV systems.

Additional studies have explored diverse deep learning paradigms: deep reinforcement learning for adaptive intrusion detection [28], federated learning frameworks enabling collaborative security without centralized data aggregation [29,30], and edge computing architectures for distributed real-time processing [31]. While these approaches demonstrate promising detection capabilities, they face challenges in balancing computational efficiency with detection accuracy under UAV operational constraints.

2.3. Temporal Sequence Modeling Architectures

Temporal sequence modeling represents a critical component for UAV cybersecurity, given the time-series nature of telemetry data. Long Short-Term Memory (LSTM) networks [15] have been widely adopted for sequential data processing, capturing temporal dependencies through gated recurrent mechanisms. However, LSTMs suffer from vanishing gradients over extended sequences (>500 timesteps) and quadratic training complexity [16]. Gated Recurrent Units (GRUs) [16] offer simplified gating mechanisms with reduced parameters but retain fundamental limitations in modeling ultra-long dependencies.

Transformer architectures [17] revolutionized sequence modeling through self-attention mechanisms, enabling parallel processing and arbitrary-length context modeling. Despite theoretical advantages, Transformers incur O(n²) computational and memory complexity, rendering them impractical for resource-constrained UAV platforms processing continuous telemetry streams. Recent work on efficient transformers attempts to address scalability challenges but remains computationally demanding for real-time UAV applications.

State space models, including Structured State Space Sequences (S4) and the recent Mamba architecture [32], offer linear-time complexity through selective state-space mechanisms. Liquid Neural Networks [23] introduce dynamic time constants enabling continuous adaptation without retraining. Kolmogorov-Arnold Networks [19] propose learnable activation functions for enhanced feature representation. However, the integration of these architectures for UAV cybersecurity applications, particularly in hybrid configurations addressing multiple complementary challenges, remains unexplored.

2.4. Research Gap and Motivation

Despite substantial progress, existing UAV intrusion detection approaches inadequately address the simultaneous requirements of: (1) multi-scale temporal dependency modeling across millisecond to minute timescales, (2) heterogeneous feature representation for diverse sensor modalities with distinct statistical characteristics, and (3) distribution shift adaptation under concept drift without computationally expensive retraining, all while maintaining real-time processing on resource-constrained platforms. Classical machine learning methods lack temporal modeling sophistication, deep learning approaches incur prohibitive computational costs, and existing temporal architectures address only subsets of these challenges in isolation.

This research gap motivates the proposed MKL architecture, which synergistically integrates Mamba’s efficient temporal encoding, KAN’s adaptive feature transformation, and Liquid networks’ dynamic adaptation to address these orthogonal yet coupled challenges. The following sections present the comprehensive methodology, architectural design, and empirical validation demonstrating that this integration achieves multiplicative error reduction rather than simple additive improvements.

3. Methodology

3.1. Selective State Space Models (Mamba)

The Mamba architecture [32] introduces selective state space models (SSMs) as an efficient alternative to traditional recurrent and attention-based mechanisms for modeling long-range temporal dependencies. The core innovation lies in the selective scan algorithm that dynamically modulates state transitions based on input-dependent parameters.

3.1.1. Mathematical Formulation

The continuous-time state space model is defined by the following differential equations:

\frac{d h (t)}{d t} = A h (t) + B x (t)

(1)

y (t) = C h (t) + D x (t)

(2)

where h(t) ∈

R^{d_{s t a t e}}

represents the hidden state, x(t) ∈

R^{d_{m o d e l}}

is the input, y(t) ∈

R^{d_{m o d e l}}

is the output, and A ∈

R^{d_{s t a t e \times} d_{s t a t e}}

, B ∈

R^{d_{s t a t e \times} d_{m o d e l}},

C ∈

R^{d_{m o d e l \times} d_{s t a t e}}

, D ∈

R^{d_{m o d e l \times} d_{m o d e l}}

are the state space parameters.

3.1.2. Discretization and Selectivity

To process discrete sequences, we apply the zero-order hold (ZOH) discretization with a learnable step size ∆:

\bar{A} = e x p (∆ A)

(3)

\bar{B} = {(∆ A)}^{- 1} (e x p (∆ A) - I) • ∆ B

(4)

The key innovation of Mamba is making these parameters input-dependent through selective projections:

[∆_{t}, B_{t}, C_{t}] = {L i n e a r}_{X_{p r o j}^{(x_{t})}}

(5)

where

∆_{t} = s o f t p l u s ({L i n e a r}_{d t_{p r o j}^{(δ_{t})}})

ensures positive step sizes.

3.1.3. Selective Scan Algorithm

The discrete recurrence with selective parameters becomes:

h_{t} = {\bar{A}}_{t} ⊙ h_{t - 1} + B_{t} ⊙ {\tilde{x}}_{t}

(6)

y_{t} = C_{t} ⊙ h_{t} + D ⊙ x_{t}

(7)

where

⊙

denotes element-wise multiplication, and

{\tilde{x}}_{t}

=

{L i n e a r}_{i n p u t_t o_s t a t e} (x_{t})

projects the input to the state space dimension.

3.1.4. Mamba Block Architecture

Each Mamba block incorporates gated linear units and residual connections:

Z = {L i n e a r}_{i n} (x) \in R^{{2 d}_{i n n e r}}

(8)

[x_{s s m}, x_{g a t e}] = split (z)

(9)

y_{s s m} = SSM (x_{s s m})

(10)

Y = {L i n e a r}_{o u t} (y_{s s m} ⊙ σ (x_{g a t e}))

(11)

Output = x + y

(12)

where

σ

denotes the SiLU (Swish) activation function.

3.2. Kolmogorov-Arnold Networks (KAN)

KAN architectures [19] replace fixed activation functions with learnable univariate functions on network edges, inspired by the Kolmogorov-Arnold representation theorem.

3.2.1. Theoretical Foundation

The Kolmogorov-Arnold theorem states that any multivariate continuous function

R^{n}

→ ℝ can be represented as:

f (x_{1}, \dots, x_{n}) = \sum_{q = 0}^{2 n} Φ_{q} (\sum_{p = 1}^{n} \emptyset_{q, p} (x_{p}))

(13)

where

\emptyset_{q, p}

:

⌊0,1⌋ \to R

and

Φ_{q} :

ℝ → ℝ are univariate functions.

3.2.2. KAN Layer Formulation

A KAN layer with input dimension

d_{i n}

and output dimension

d_{o u t}

is defined as:

y_{j} = \sum_{i = 1}^{d_{i n}} φ_{i, j ({\hat{x}}_{i})}

(14)

where

φ_{i, j}

: ℝ → ℝ are learnable activation functions and

{\hat{x}}_{i}

represents normalized inputs:

{\hat{x}}_{i} = \frac{x_{i} - μ_{i}}{σ_{i} + ϵ}

(15)

with

μ_{i}

and

σ_{i}

being the running mean and standard deviation of the i-th input dimension.

3.2.3. B-Spline Activation Functions

The learnable activation functions are parameterized using B-splines of order k:

φ_{i, j} (x) = \sum_{m = 0}^{G + k} c_{i, j, m} • B_{m}^{k} (x)

(16)

where

c_{i, j, m}

are learnable coefficients, G is the grid size, and

B_{m}^{k} (x)

are B-spline basis functions defined recursively:

B_{m}^{0} (x) = \{\begin{matrix} 1, i f t_{m} \leq x \leq t_{m + 1} \\ 0, o t h e r w i s e c a s e \end{matrix}

(17)

B_{m}^{k} (x) = \frac{x - t_{m}}{t_{m + k} - t_{m}} B_{m}^{k - 1} (x) + \frac{t_{m + k + 1} - x}{t_{m + k + 1} - t_{m + 1}} B_{m}^{k - 1} (x)

(18)

where

\{t_{m}\}

are knot points uniformly distributed over the input domain.

3.3. Liquid Neural Networks

Liquid Neural Networks [23] incorporate dynamic time constants that adapt based on input characteristics, enabling continuous-time neural computation.

3.3.1. Liquid Time-Constant Neurons

The dynamics of a Liquid Time-Constant (LTC) neuron are governed by:

τ_{i} (t) = σ (W_{τ} [x (t); h (t - 1)] + b_{τ})

(19)

\frac{{d h}_{i}}{d t} = - \frac{1}{τ_{i} (t)} (h_{i} - {\hat{h}}_{i})

(20)

where

τ_{i} (t)

is the adaptive time constant, and

{\hat{h}}_{i}

is the target state:

{\hat{h}}_{i} = t a n h (W_{h} h (t - 1) + W_{x} x (t) + b)

(21)

3.3.2. Discrete-Time Implementation

For discrete sequences with sampling interval

∆ t = 1,

the update equation becomes:

α_{i} = e x p (- \frac{∆ t}{τ_{i} + \in})

(22)

h_{i} (t) = {\hat{h}}_{i} (t) + α_{i} (h_{i} (t - 1) - {\hat{h}}_{i} (t))

(23)

This formulation allows the network to adapt its temporal dynamics based on input characteristics, crucial for handling varying attack patterns with different temporal signatures.

3.4. Hybrid MKL Architecture

The complete Mamba-KAN-Liquid (MKL) model integrates the three components in a hierarchical architecture.

Information Flow

Given an input sequence X

\in

R^{T x d_{i n p u t}}

Z = M a m b a E n c o d e r (X) \in R^{T x d_{m a m b a}}

(24)

1.: Temporal Pooling:

z_{p o o l e d} = (1 / T) x \sum_{t = 1}^{T} z_{t} \in R^{d_{m a m b a}}

(25)

2.: Feature Transformation (KAN):

f = K A N x z_{p o o l e d} \in R^{d_{k a n}}

(26)

f_{j} = \sum_{i = 1}^{d_{i n}} φ_{i, j} (\frac{(x_{i} - μ_{i})}{σ_{i}} + ε) \in R^{d_{m a m b a}}

(27)

3.: Adaptive Processing (Liquid):

h = L i q u i d L a y e r (f) \in R^{d_{l i q u i d}}

(28)

4.: Task-Specific Heads:

Anomaly detection: $P_{a n o m a l y} = σ (W_{a n o m a l y} h + b_{a n o m a l y})$ .
Attack classification: $p_{c l a s s} = s o f t m a x (W_{c l a s s} h + b_{c l a s s})$ .

3.5. Loss Functions

The model is trained using a multi-task learning objective:

L_{t o t a l} = α L_{a n o m a l y} + β L_{c l a s s} + γ L_{r e c o n}

(29)

where

Binary anomaly detection loss:

L_{a n o m a l y} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i}^{(a)} \log (p_{i}^{(a)}) + (1 - y_{i}^{(a)}) \log (1 - p_{i}^{(a)})]

(30)

Multi-class attack type loss:

L_{c l a s s} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i, c}^{(m)} \log (p_{i, c}^{(m)})

(31)

Reconstruction loss:

L_{r e c o n} = - \frac{1}{N T} \sum_{i = 1}^{N} \sum_{t = 1}^{T} {(x_{i, t} - {\hat{x}}_{i, t})}^{2}

(32)

3.6. Optimization Strategy

The model parameters are optimized using AdamW with weight decay regularization:

θ_{t + 1} = θ_{t} - \frac{η \hat{m_{t}}}{(\sqrt{{\hat{v}}_{t}} + \in)} - η λ θ_{t}

(33)

where

\hat{m_{t}}

and

{\hat{v}}_{t}

are bias-corrected first and second moment estimates,

η

is the learning rate, and

λ

is the weight decay coefficient.

Gradient clipping is applied to prevent exploding gradients:

g_{c l i p p e d} = \{\begin{matrix} g, \frac{i f ‖g‖}{2} \leq {t a u}_{c l i p} \\ {t a u}_{c l i p} • \frac{g}{{‖g‖}_{2}}, o t h e r w i s e \end{matrix}

(34)

This comprehensive methodology enables the MKL model to effectively capture temporal dependencies, learn adaptive feature representations, and dynamically adjust to evolving threat patterns in UAV cybersecurity applications.

4. Proposed Mamba-KAN-Liquid (MKL) Framework

4.1. Architecture Overview

The proposed Mamba-KAN-Liquid (MKL) model presents a novel hierarchical architecture specifically designed for real-time cybersecurity threat detection in UAV systems. The architecture integrates three complementary neural paradigms: (1) Mamba’s selective state space models for efficient temporal sequence processing, (2) Kolmogorov-Arnold Networks for adaptive feature representation learning, and (3) Liquid Neural Networks for dynamic temporal adaptation. This synergistic combination addresses the unique challenges of UAV cybersecurity, including long-range temporal dependencies, high-dimensional heterogeneous data, and evolving threat landscapes.

4.2. Architectural Components

4.2.1. Input Embedding Layer

The raw UAV telemetry

X^{r a w} \in

R^{{T x d}_{r a w}}

undergoes initial transformation through a learnable embedding layer:

X = W_{e m b} X^{r a w} + b_{e m b}

(35)

where

W_{e m b} \in R^{{d_{m o d e l} x d}_{r a w}}

and

b_{e m b}

\in R^{d_{m o d e l}}

project the input to the model’s hidden dimension

d_{m o d e l}

= 256.

4.2.2. Mamba Temporal Encoder Stack

The temporal encoder consists of L = 4 stacked Mamba blocks, each performing selective state space transformations:

H^{(0)} = X

(36)

H^{(l)} = {M a m b a B l o c k}^{(l)} (H^{(l - 1)}), l \in \{1, \dots, L\}

(37)

Each Mamba block implements the following operations:

1.: Layer Normalization: $H_{n o r m} = L a y e r N o r m (H^{(l - 1)})$ .
2.: Gated Projection: $[Z_{s s m}, Z_{g a t e}] = s p l i t (W_{i n} H_{n o r m})$ .
3.: Selective SSM Transform: $Z_{o u t} = S e l e c t i v e S S M (Z_{s s m})$ .
4.: Gating and Projection: $H^{(l)} = H^{(l - 1)} + W_{o u t} (Z_{o u t} ⊙ S i L U (Z_{g a t e}))$ .

4.2.3. Temporal Aggregation

The encoded sequence Z =

H^{(L)} \in

R^{{T x d}_{m o d e l}}

undergoes mean pooling to obtain a fixed dimensional representation:

z_{a g g} = \frac{1}{T} \sum_{t = 1}^{T} z_{t}

(38)

This aggregation preserves global temporal information while enabling subsequent processing by non-sequential layers.

4.2.4. KAN Feature Transformation Network

The aggregated features pass through a multi-layer KAN with architecture [256, 128, 64, 32]:

f^{(0)} = z_{a g g}

(39)

f^{(k)} = {K A N L a y e r}^{(k)} (f^{(k - 1)}), k \in 1, 2, 3, 4

(40)

Each KAN layer implements:

f_{j}^{(k)} = \sum_{i = 1}^{d_{k}} φ_{i, j}^{(k)} (\frac{f_{i}^{(k - 1)} - μ_{i}^{(k)}}{σ_{i}^{(k)} + ϵ})

(41)

where

φ_{i, j}^{(k)}

are B-spline parameterized activation functions with grid size G = 5.

4.2.5. Liquid Adaptation Layer

The KAN output

f^{(4)} \in R^{32}

feeds into the Liquid layer with

d_{l i q u i d} = 20

hidden units:

τ = σ (W_{τ} [f^{(4)}; h_{i n i t}] + b_{τ})

(42)

h_{t a r g e t} = \tanh (W_{h} h_{i n i t} + W_{x} f^{(4)} + b)

(43)

{h = h}_{t a r g e t} + e x p (- 1 / τ) ⊙ (h_{i n i t} - h_{t a r g e t})

(44)

where

h_{i n i t} = 0

for initialization.

4.2.6. Task-Specific Output Heads

The liquid representation

h \in

R^{20}

feeds into two specialized heads:

Anomaly Detection Head:

s_{a n o m a l y} = w_{a n o m a l y}^{T} h + b_{a n o m a l y}

(45)

p_{a n o m a l y} = σ (s_{a n o m a l y})

(46)

Attack Classification Head:

s_{c l a s s} = W_{c l a s s} h + b_{c l a s s}

(47)

p_{c l a s s} = s o f t m a x (s_{c l a s s})

(48)

where

W_{c l a s s}

\in

R^{C x 20}

for C = 7 attack categories.

4.2.7. Reconstruction Decoder

For self-supervised learning and anomaly localization:

D = [r e p e a t (h, T); Z] \in R^{T \times (20 + 256)}

(49)

\hat{X} = {M L P}_{d e c o d e r} (D)

(50)

where the decoder MLP has architecture [276 → 256 → 128 →

d_{i n p u t}

] with ReLU activations.

4.3. Computational Complexity Analysis

The MKL architecture achieves linear computational complexity of O(T·d²) through efficient component design, where T is the sequence length and d is the model dimension. This represents a significant improvement over Transformer architectures requiring O(T²·d) operations, enabling sub-100 ms inference latency for long UAV telemetry streams (T > 1000). The detailed complexity breakdown is presented in Table 1.

4.4. Training Procedure

4.4.1. Optimization Algorithm

The model employs multi-task learning (Section 4.5) with loss weights

α_{a n o m a l y}

= 1.0,

α_{c l a s s}

= 1.0,

α_{r e c o n}

= 0.5.

Model parameters are optimized using the AdamW optimizer with learning rate η = 10⁻³ following a cosine annealing schedule, weight decay λ = 0.01, and gradient clipping threshold to prevent exploding gradients during Mamba’s selective state space recurrence. Training proceeds with batch size B = 32 for maximum E = 10 epochs with early stopping (patience = 3 epochs).

AdamW was selected for its decoupled weight decay mechanism [33], which provides superior generalization in multi-task learning—critical for balanced performance across the MKL architecture’s combined objectives (anomaly detection, classification, reconstruction). AdamW has also become the standard optimizer for transformer and state space models.

4.4.2. Data Augmentation

To enhance model robustness and generalization capability, we apply several temporal augmentation strategies during training. Temporal jittering introduces random time shifts within ±5% of the sequence length, simulating variations in attack timing and system latency. Amplitude scaling applies random multiplicative factors in the range [0.9, 1.1] to telemetry features, accounting for sensor calibration differences and measurement uncertainties across UAV platforms. Gaussian noise injection with zero mean and standard deviation 0.01 models realistic sensor noise and communication interference encountered in operational environments. These augmentation techniques collectively improve the model’s ability to generalize across diverse UAV configurations and operational conditions without requiring extensive real-world attack data collection.

4.5. Inference Pipeline

4.5.1. Real-Time Processing

The inference pipeline is designed for real-time deployment on UAV systems, implementing efficient processing strategies to meet strict latency requirements. The system employs a sliding window approach where incoming telemetry data is continuously buffered, with each inference window defined as:

X_{t} = b u f f e r [t - T + 1 : t]

(51)

This formulation enables the model to process streaming data with minimal latency by maintaining a fixed-size temporal context. For scenarios involving multiple UAVs in swarm configurations, the architecture supports batch inference processing when the batch size exceeds unity:

B_{i n f e r e n c e} > 1

(52)

This capability allows centralized ground stations to simultaneously monitor entire fleets, leveraging the model’s computational efficiency to process multiple telemetry streams in parallel. The detection mechanism employs a threshold-based classification system:

a l e r t = \{\begin{matrix} 1, & i f p_{a n o m a l y > θ_{a n o m a l y}} \\ 0, & o t h e r w i s e \end{matrix}

(53)

The detection threshold is set to

θ_{a n o m a l y} =

0.5, balancing sensitivity and specificity based on empirical validation. This threshold can be dynamically adjusted in operational deployment to accommodate varying risk tolerance levels and false positive constraints specific to different mission profiles.

4.5.2. Attack Type Classification

Upon detecting an anomaly, the system immediately invokes the attack classification head to identify the specific threat category. The classification decision employs a maximum likelihood approach:

a t t a c k_{t y p e} = \arg {m a x}_{c \in \{1, \dots, C\}} p_{c l a s s}^{(c)}

(54)

This classification enables appropriate countermeasure selection, where different attack types trigger distinct defensive responses. For instance, GPS spoofing detection prompts cross-validation with alternative positioning systems, while network jamming initiates frequency hopping protocols. The classification confidence scores are also logged for post-incident forensic analysis, enabling security teams to understand attack patterns and refine detection thresholds based on operational experience.

4.6. Model Deployment Considerations

The MKL model is designed for UAV edge deployment, requiring ARM Cortex-A72+ processor (≥1.5 GHz), 2 GB RAM, and 500 MB storage, achieving sub-100 ms inference latency with 2.5 M parameters (10 MB) suitable for resource-constrained platforms.

4.7. Theoretical Advantages

The MKL architecture achieves computational efficiency through three complementary mechanisms. Mamba’s selective state space enables linear O(T) complexity for sequences exceeding 1000 timesteps, avoiding Transformer’s quadratic cost. KAN layers provide universal function approximation with superior parameter efficiency compared to traditional MLPs, enabling compact feature representations. Liquid Networks enable continuous adaptation to concept drift without explicit retraining, addressing adversarial evolution through dynamic time constants.

The architectural synergy manifests through complementary approximation with multiplicative error reduction,

ε_{M K L} \leq ε_{M a m b a} \cdot α_{K A N} \cdot β_{L i q u i d}

, where

α_{K A N}

and

β_{L i q u i d}

< 1 represent error reduction coefficients. Empirical validation confirms this: Mamba-only achieves

ε

≈ 0.128 (87.2%), Mamba + KAN reduces to

ε

≈ 0.084 (91.6%,

α_{K A N}

≈ 0.656), and full MKL achieves

ε

≈ 0.055 (94.5%,

β_{L i q u i d}

≈ 0.655). The predicted error 0.128 × 0.656 × 0.655 ≈ 0.055 matches observed performance, explaining super-additive gains in zero-day detection (89.4% vs. 88.5% predicted) and severe noise robustness (76.5% vs. 74.7% predicted).

This hierarchical composition positions MKL as a principled solution for securing cyber-physical systems in adversarial environments with evolving threat landscapes.

5. Results and Discussion

5.1. Experimental Setup

5.1.1. Datasets

The proposed MKL model was evaluated on three comprehensive datasets to ensure robust validation across diverse attack scenarios. The CIC-IDS2017 dataset [34,35], a benchmark intrusion detection corpus, contains 2,830,743 instances with 78 features covering 14 attack types including DDoS, PortScan, and Botnet attacks. The dataset was preprocessed to extract temporal sequences of length T = 100 for UAV-relevant attack patterns, with features normalized using z-score standardization computed on the training set. The CSE-CIC-IDS2018 dataset [36], an updated version incorporating modern attack vectors, comprises 16,232,943 instances including infiltration, web attacks, and brute force patterns, providing a more contemporary threat landscape for model evaluation.

To address the scarcity of UAV-specific attack data, we generated a synthetic UAV telemetry dataset simulating 10,000 flight sequences with 128-dimensional feature vectors. Each sequence represents realistic UAV operations incorporating GPS coordinates (latitude, longitude, altitude, velocity, heading), Inertial Measurement Unit data (tri-axial accelerometer, gyroscope, magnetometer readings), communication metrics (received signal strength indicator, signal-to-noise ratio, packet loss rate, latency, jitter), flight control parameters (motor speeds, servo positions, battery voltage, current draw), and environmental sensors (barometric pressure, temperature, humidity). The synthetic dataset includes six UAV-specific attack scenarios with controlled intensity levels to systematically evaluate detection sensitivity across varying threat magnitudes.

5.1.2. Baseline Methods

The MKL model was compared against six state-of-the-art approaches representing diverse methodological paradigms. The LSTM-Attention baseline implements a bidirectional LSTM network with multihead attention mechanisms, achieving strong temporal modeling capabilities at the cost of increased computational complexity. The Transformer baseline employs a standard transformer encoder architecture for sequence classification, representing the current state-of-the-art in sequence modeling but requiring substantial computational resources. The CNN-LSTM hybrid combines convolutional layers for spatial feature extraction with LSTM layers for temporal modeling, balancing expressiveness with computational efficiency. Traditional machine learning baselines include Random Forest, an ensemble method using handcrafted features, and Isolation Forest for unsupervised anomaly detection. The VAE-GAN baseline combines variational autoencoders with adversarial training for generative anomaly detection, representing recent advances in unsupervised deep learning approaches.

5.1.3. Evaluation Metrics

Performance assessment employed a comprehensive suite of metrics capturing different aspects of model behavior. Standard classification metrics include Precision (P = TP/(TP + FP)), Recall (R = TP/(TP + FN)), F1-Score (2PR/(P + R)), and overall Accuracy ((TP + TN)/(TP + TN + FP + FN)). The Area Under the Receiver Operating Characteristic curve (AUROC) quantifies discrimination capability across all possible threshold settings, while the Area Under the Precision-Recall Curve (AUPRC) provides a more informative metric for imbalanced datasets common in cybersecurity applications. Operational metrics include inference time measured as average processing time per sample batch, memory usage tracking peak RAM consumption during inference, and false positive rate (FPR) calculated as FP/(FP + TN) to assess operational disruption from false alarms. Energy consumption was measured in millijoules per sample to evaluate deployment feasibility on battery-powered UAV platforms.

5.1.4. Implementation Details

All models were implemented in PyTorch 2.0 and trained on NVIDIA RTX 3090 GPUs with 24 GB memory. Training employed a batch size of 32 samples, selected to balance gradient estimation quality with memory constraints. The learning rate was initialized at 1 × 10⁻³ and followed a cosine annealing schedule over 10 training epochs, with early stopping applied when validation loss failed to improve for 3 consecutive epochs. The train/validation/test split followed a 60/20/20 ratio, with stratified sampling ensuring balanced representation of attack types across partitions. Data augmentation strategies included temporal jittering with random time shifts within ±5% of sequence length, amplitude scaling with random multiplicative factors in range [0.9, 1.1], and Gaussian noise injection with

σ

= 0.01 to simulate realistic sensor noise and improve model robustness.

5.2. Performance Results

5.2.1. Overall Performance Comparison

The comprehensive performance evaluation on the CIC-IDS2017 dataset reveals significant advantages of the MKL architecture across multiple dimensions. Table 2 presents comparative results showing that the proposed model achieves 95.3 ± 0.3% precision, 93.7 ± 0.4% recall, and 94.5 ± 0.3% F1-score, representing substantial improvements over baseline methods. The Transformer baseline, despite its theoretical expressiveness, achieves 92.7 ± 0.5% precision and 91.5 ± 0.4% F1-score while requiring 234.6 ms average inference time. The LSTM-Attention model demonstrates 91.4 ± 0.6% precision with 187.4 ms latency, indicating the computational burden of attention mechanisms. Traditional machine learning approaches show inferior performance, with Random Forest achieving 80.9 ± 0.9% F1-score and Isolation Forest reaching only 76.8 ± 1.4%, confirming the necessity of deep learning for complex temporal pattern recognition in UAV cybersecurity. Statistical significance testing using paired t-tests confirms that MKL’s performance improvements over the best baseline (Transformer) are statistically significant (p < 0.001) across all metrics, validating that observed gains represent genuine architectural advantages rather than random variation.

The most striking aspect of the MKL performance is the simultaneous achievement of superior accuracy and dramatically reduced inference latency. The 47.3 ms average inference time represents a 5-fold speedup compared to the Transformer baseline and nearly 4-fold improvement over LSTM-Attention, while maintaining 3.0 percentage points higher F1-score than the best competing method. This efficiency gain stems from the Mamba encoder’s linear-time complexity O(T·L·d²) compared to the Transformer’s quadratic O(T²·d) scaling, enabling practical deployment on resource-constrained UAV platforms without sacrificing detection accuracy. The AUROC metric of 0.981 for the MKL model indicates exceptional discrimination capability across all possible threshold settings, substantially outperforming the Transformer’s 0.956 and LSTM-Attention’s 0.948. This superior area under curve demonstrates that the model maintains high true positive rates while keeping false positive rates minimal across the entire operating spectrum.

The relatively small standard deviations (±0.3–0.4%) across five-fold cross-validation runs indicate stable and reproducible performance, crucial for operational deployment where consistent behavior is essential for mission-critical security decisions. The consistent superiority across five independent folds with non-overlapping 95% confidence intervals further validates the robustness of the proposed architecture. The VAE-GAN baseline, despite its sophisticated generative modeling approach, achieves only 87.4 ± 0.8% F1-score, suggesting that purely unsupervised methods struggle to capture the nuanced temporal patterns characteristic of sophisticated cyberattacks. The CNN-LSTM hybrid demonstrates respectable 88.3 ± 0.7% F1-score with 142.3 ms latency, positioning it as a reasonable compromise between traditional machine learning and state-of-the-art deep learning approaches, though still substantially inferior to the proposed MKL architecture in both accuracy and efficiency dimensions.

As shown in Figure 1, the cross-dataset precision-recall analysis demonstrates that MKL maintains robust performance across diverse data sources. The model achieves precision above 93% at maximum recall for all three evaluation scenarios, with minimal variance of only 0.8 percentage points between CIC-IDS2017, CSE-CIC-IDS2018, and Synthetic-UAV datasets. This consistency validates that learned representations capture fundamental attack characteristics rather than dataset-specific artifacts, enabling confident deployment across operational environments without requiring retraining. The tight clustering of curves indicates that the model’s discrimination capability generalizes effectively from general cybersecurity datasets to UAV-specific scenarios, addressing a critical concern for practical deployment where authentic attack data may be scarce or unavailable due to security constraints. The synthetic-to-real transfer capability is particularly noteworthy, demonstrating that the model can be effectively pre-trained on simulated UAV telemetry and deployed with minimal performance degradation on actual flight data.

The precision-recall comparison presented in Figure 2 reveals that MKL maintains precision above 92% even at maximum recall, substantially outperforming baseline methods across the entire operating range. The area under the precision-recall curve (AUPRC) of 0.962 for MKL versus 0.891 for Transformer represents an 8% improvement in practical operating scenarios where class imbalance is pronounced. This characteristic is particularly valuable in cybersecurity applications where attack instances represent a small fraction of total observations, and maintaining high precision at elevated recall levels directly translates to reduced false alarm rates in operational deployment. Random Forest (purple curve) exhibits the steepest degradation, dropping below 60% precision at high recall levels, confirming the limitations of traditional machine learning for complex temporal pattern recognition. Deep learning methods (LSTM in green, CNN-LSTM in orange) maintain intermediate performance, achieving 85–88% precision at maximum recall. The Transformer baseline (blue curve) demonstrates strong performance but remains consistently below MKL throughout the operating range. The consistent elevation of the MKL curve above all baseline methods throughout the recall spectrum demonstrates that the architectural innovations—Mamba’s selective state spaces, KAN’s learnable activations, and Liquid’s adaptive dynamics—provide genuine improvements rather than merely shifting the precision-recall tradeoff.

The false positive rate analysis illustrated in Figure 3 provides crucial insights into operational viability across varying sensitivity settings. At the default threshold of 0.5, the MKL model achieves an FPR of 3.5% (brown area) compared to 8.2% for the Transformer baseline (light green area), representing a 57% reduction in false alarms. This translates to 4.7 fewer false alarms per 100 flights, significantly reducing operational disruption and alert fatigue among security personnel. The stacked area visualization demonstrates that MKL maintains consistently lower FPR across the entire threshold range from 0.1 to 0.9, indicating robust discrimination capability independent of operating point selection. At aggressive thresholds (0.3), MKL achieves 6.5% FPR while Transformer reaches 13%, maintaining the relative performance advantage. At conservative thresholds (0.7), MKL reduces FPR to 1.8% compared to Transformer’s 4.2%, providing operators with flexibility to adjust sensitivity based on mission-specific risk tolerance. This consistent low FPR across different operating thresholds demonstrates that the model’s superior discrimination capability extends beyond default settings, enabling adaptive threshold selection without incurring prohibitive false alarm penalties. In practical UAV fleet operations where hundreds of flights occur daily, this reduction in false positives directly impacts operational efficiency by minimizing unnecessary mission aborts, pilot distractions, and investigation overhead that would otherwise consume valuable resources and degrade mission effectiveness. The narrow brown band (MKL) compared to wider colored bands above it visually confirms the model’s superior precision in distinguishing genuine cyber threats from benign telemetry anomalies caused by environmental factors, sensor noise, or legitimate operational variations.

5.2.2. Attack-Specific Detection Performance

Beyond aggregate performance metrics, examining detection capabilities for individual attack types reveals important patterns in model behavior and identifies areas requiring specialized attention. Table 3 presents detection rates broken down by attack category, demonstrating that the MKL model achieves consistently high performance across diverse threat vectors while maintaining real-time processing constraints.

The attack-specific analysis reveals several important insights into model capabilities and limitations. DDoS attacks achieve the highest detection rate at 98.1 ± 0.3%, likely attributable to their distinctive traffic patterns characterized by sudden volume spikes and repetitive packet structures that Mamba’s temporal encoding effectively captures. The 35.6 ms average detection time for DDoS represents the fastest processing among all attack types, suggesting that the model quickly recognizes the clear temporal signatures associated with distributed denial-of-service patterns. This rapid detection is particularly valuable for DDoS scenarios where timely response is critical to prevent service degradation, as even brief delays in threat identification can allow attackers to overwhelm communication channels and compromise mission-critical UAV control links.

GPS spoofing detection reaches 97.3 ± 0.4% accuracy with 38.2 ms latency, validating the model’s capability to identify both coordinate-based anomalies through the KAN feature transformation and temporal drift patterns through Mamba’s selective state space mechanism. The slightly longer detection time compared to DDoS reflects the more nuanced nature of GPS spoofing, which often manifests as gradual coordinate drift rather than abrupt changes, requiring the model to analyze longer temporal contexts to distinguish malicious manipulation from legitimate navigation adjustments. Man-in-the-middle attacks achieve 96.2 ± 0.5% detection with 39.8 ms latency, demonstrating strong performance on this critical threat category that targets the communication channel between UAVs and ground control stations.

The most remarkable achievement is the 89.4 ± 1.2% detection rate for zero-day attacks, representing a 16.2 percentage point improvement over the best baseline method. This substantial gain demonstrates the value of Liquid Neural Networks’ adaptive capabilities for handling novel threats absent from training distributions. The dynamic time constants enable the model to adjust its temporal processing characteristics based on input features, providing resilience against attack variants that deviate from known patterns. However, the higher standard deviation (±1.2%) and elevated detection time (52.4 ms) for zero-day attacks indicate increased uncertainty and computational requirements when confronting unfamiliar threats, suggesting that additional processing time is allocated for careful evaluation of ambiguous patterns. Despite these characteristics, the 52.4 ms latency remains well within the 100 ms real-time constraint, ensuring that even novel attack detection maintains operational responsiveness.

Sensor manipulation attacks, while achieving respectable 94.7 ± 0.7% detection accuracy, represent the most challenging known attack category. The relatively longer 43.2 ms detection time suggests that distinguishing subtle, gradual sensor bias injection from natural measurement drift requires more extensive temporal analysis. This finding indicates a potential area for future enhancement, possibly through specialized attention mechanisms focused on slow-varying trends or integration of physics-based models that encode expected sensor behavior under normal flight conditions. Network jamming detection at 95.8 ± 0.6% with 41.5 ms latency demonstrates robust performance on this critical denial-of-service variant that targets wireless communication channels, validating the model’s ability to detect both volumetric attacks (DDoS) and signal-level interference (jamming) through complementary feature representations.

The cross-dataset performance analysis presented in Table 4 demonstrates that MKL maintains consistent detection capabilities across diverse data sources. The model achieves 93.9–95.1% F1-score across all three datasets, with minimal performance variance of only 1.2 percentage points. DDoS detection leads across all datasets at 97.8–98.3%, benefiting from distinctive volumetric signatures that remain stable across different data collection environments. GPS spoofing follows at 96.8–97.6%, demonstrating robust coordinate anomaly detection with only 0.8 percentage points variance across datasets. Network jamming (95.2–96.1%) and Man-in-the-Middle (95.8–96.4%) indicate strong performance on communication-targeting attacks, with consistent accuracy improvements observed on the more recent CSE-CIC-IDS2018 and UAV-specific synthetic datasets. Sensor manipulation (94.2–95.0%) reflects the challenge of distinguishing subtle bias injection from measurement noise, though performance remains stable across evaluation scenarios. Zero-day detection (88.7–90.0%) substantially outperforms baseline methods while showing gradual improvement from general cybersecurity datasets to UAV-specific scenarios, validating the Liquid layer’s adaptive mechanisms. The minimal variance across datasets confirms that learned representations capture fundamental attack characteristics rather than dataset-specific artifacts, a critical requirement for operational deployment where the model must handle diverse threats from various sources without retraining.

The transfer learning analysis illustrated in Table 5 reveals strong generalization capabilities across dataset boundaries. When training on CIC-IDS2017 and testing on CSE-CIC-IDS2018, the model achieves 82.4% zero-shot accuracy (87% transfer efficiency), demonstrating effective knowledge transfer between general cybersecurity datasets. Training on CSE-CIC-IDS2018 and testing on CIC-IDS2017 yields 84.1% zero-shot accuracy (88% transfer efficiency), indicating bidirectional compatibility. The synthetic-to-real transfer capability is particularly noteworthy, with models trained on Synthetic UAV data achieving 71.3% zero-shot accuracy on CIC-IDS2017 (76% transfer efficiency) and 73.8% on CSE-CIC-IDS2018 (79% transfer efficiency). After minimal fine-tuning (2–4 epochs), transfer accuracy improves to 90–93%, enabling rapid deployment in scenarios where authentic attack data is scarce or unavailable due to security constraints. The asymmetric transfer efficiency—real-to-synthetic (83–84%) versus synthetic-to-real (76–79%)—suggests that while simulated data provides valuable training signal, incorporating real-world operational characteristics remains beneficial for optimal performance.

The latency analysis presented in Figure 4 demonstrates remarkable consistency in detection speed across both attack types and datasets. DDoS exhibits the fastest detection at 35–36 ms across all datasets, reflecting quick recognition of volumetric patterns characterized by sudden traffic spikes. GPS spoofing (38–39 ms), Man-in-the-Middle (39–40 ms), and Network Jamming (41–42 ms) cluster in the mid-range, requiring moderate temporal analysis to identify coordinate drift, communication interception, and signal interference patterns, respectively. Sensor Manipulation (43–44 ms) demands slightly longer processing to distinguish gradual anomalies from natural measurement noise through extended temporal context analysis. Zero-Day detection requires maximum processing time (51–53 ms) but remains well within the 100 ms real-time service level agreement, with the additional latency allocated for careful evaluation of ambiguous patterns that deviate from known attack signatures. The minimal latency variance across datasets (typically 1–3 ms) confirms that computational characteristics remain stable regardless of data source, enabling predictable performance planning for operational deployments. All attack types maintain sub-55 ms detection latency across three datasets, substantially below the 100 ms threshold required for real-time UAV cybersecurity applications, validating the architectural efficiency of the MKL model for time-critical threat detection scenarios.

5.3. Computational Efficiency Analysis

The practical deployment of deep learning models on resource-constrained UAV platforms necessitates careful analysis of computational requirements beyond detection accuracy. Table 6 presents comprehensive resource utilization metrics across competing architectures, revealing that the MKL model achieves superior efficiency across multiple dimensions while maintaining the highest detection performance.

The MKL architecture demonstrates exceptional parameter efficiency with only 2.5 million parameters, representing a 7.6-fold reduction compared to the Transformer baseline’s 18.9 million parameters. This dramatic reduction in model size enables deployment on embedded systems with limited storage capacity, a critical requirement for UAV platforms where every gram of payload capacity impacts flight duration and mission capability. The compact architecture stems from three complementary design choices: the Mamba encoder’s shared state transition matrices that avoid the parameter explosion of multihead attention mechanisms, the KAN layer’s B-spline basis functions that replace dense weight matrices with compact lookup tables, and the Liquid layer’s lightweight adaptation mechanism that adds only 20 hidden units with dynamic time constants. The 96 MB memory footprint during inference, including activation tensors and temporary buffers, remains well below the typical 2 GB RAM available on modern embedded processors like the ARM Cortex-A72, leaving substantial headroom for other flight-critical software components including navigation, control systems, and communication protocols.

The throughput analysis reveals that the MKL model processes 676 samples per second on the evaluation hardware, representing a 5-fold improvement over the Transformer’s 136 samples per second and 4-fold gain over LSTM-Attention’s 168 samples per second. This throughput advantage directly translates to operational capabilities in UAV swarm scenarios. For a typical deployment monitoring 20 UAVs generating telemetry at 10 Hz, the system must process 200 samples per second. The MKL model handles this workload with 70% computational headroom (676 vs. 200 required), providing resilience against processing spikes during concurrent attack scenarios or network congestion events. In contrast, the Transformer baseline operates at only 68% capacity (136 vs. 200 required), risking queue buildup and latency accumulation during peak loads. The CNN-LSTM hybrid achieves 224 samples per second, providing modest headroom but still requiring 89% utilization under normal conditions. This throughput efficiency enables centralized monitoring architectures where a single ground station processor handles entire swarm operations, eliminating the need for distributed processing infrastructure that would introduce communication overhead and synchronization complexity.

The energy efficiency analysis reveals critical implications for battery-powered UAV operations and ground station deployments. The MKL model consumes 12.4 millijoules per sample, representing a 3.4-fold improvement over the Transformer’s 42.5 mJ/sample and 2.5-fold gain over LSTM-Attention’s 31.2 mJ/sample. For continuous monitoring operations processing telemetry at 10 Hz, the MKL model consumes approximately 124 milliwatts, negligible compared to typical ground station power budgets of 50–200 watts. However, in scenarios where threat detection must occur onboard the UAV itself—such as autonomous operations beyond communication range—energy efficiency becomes paramount. A UAV processing its own telemetry at 10 Hz would consume 124 mW with MKL versus 425 mW with Transformer, representing 301 mW savings. Over a typical 30 min mission, this translates to 541 joules of energy savings, equivalent to approximately 41 mAh from a standard 3.7 V LiPo battery. For reference UAV platforms carrying 5000 mAh batteries, this represents a 0.8% capacity saving, potentially extending flight time by approximately 14 s or equivalently increasing operational range by 70 m at 5 m/s cruise speed.

The computational cost analysis, measured in floating-point operations per inference, shows that MKL requires 0.78 GFLOPs compared to Transformer’s 4.12 GFLOPs, representing an 81% reduction in computational work per sample. This efficiency stems from the Mamba encoder’s linear-time selective state space mechanism, which processes temporal sequences through matrix-vector multiplications with complexity O(T·d²) rather than the Transformer’s attention mechanism with O(T²·d) complexity. For the typical sequence length T = 100 used in this study, this represents approximately a 12-fold reduction in attention-related operations. The KAN layers, despite their learnable activation functions based on B-spline basis functions, add minimal overhead due to efficient lookup table implementations that amortize basis function evaluations across batch dimensions. Each KAN layer evaluates 5 B-spline basis functions per input, requiring only 5 polynomial evaluations compared to the hundreds of ReLU activations in equivalent MLP layers. The Liquid layer’s dynamic time constant computation introduces negligible overhead, as the adaptive mechanisms operate on compact 20-dimensional hidden states rather than high-dimensional activation tensors.

The training efficiency analysis demonstrates practical advantages for model development and deployment cycles. The MKL architecture converges within 10 epochs on the CIC-IDS2017 dataset, requiring approximately 4.2 h of training time on a single NVIDIA RTX 3090 GPU. In contrast, the Transformer baseline requires 15 epochs and 12.8 h to achieve comparable validation performance, representing a 3-fold increase in training time. The LSTM-Attention model demands 18 epochs and 7.3 h, while the CNN-LSTM hybrid converges in 14 epochs requiring 5.8 h. This training efficiency reduces development iteration cycles during model refinement and hyperparameter tuning, enabling rapid adaptation when new attack patterns emerge. The faster convergence stems from the architectural inductive biases: the Mamba encoder’s state space formulation provides strong temporal priors that accelerate learning of sequential dependencies, the KAN layers’ learnable activations reduce the need for extensive hyperparameter search compared to fixed activation functions, and the Liquid layer’s adaptive dynamics naturally handle distribution shifts without requiring explicit regularization strategies.

The energy efficiency analysis presented in Figure 5 demonstrates that MKL achieves substantial improvements across all baseline methods. The 3.4-fold energy advantage over Transformer (12.4 vs. 42.5 mJ/sample) enables extended mission operations without additional power infrastructure. For continuous monitoring at 10 Hz, MKL consumes only 124 mW compared to Transformer’s 425 mW, representing 301 mW savings that translate to approximately 41 mAh battery capacity over a 30 min mission. The memory access pattern analysis reveals additional efficiency characteristics relevant for embedded deployment. The MKL model exhibits high cache locality, accessing 96 MB of working memory in predictable sequential patterns that align with CPU cache line sizes. The Mamba encoder’s recurrent state updates naturally maintain temporal locality, while the KAN layer’s B-spline lookup tables fit within L2 cache boundaries on modern processors. In contrast, the Transformer’s attention mechanism generates scattered memory access patterns across the full sequence length, resulting in cache misses that degrade real-world performance beyond theoretical FLOP counts. On ARM CorteA72 processors typical of embedded UAV computers, preliminary profiling indicates that MKL achieves 87% of theoretical peak throughput, while Transformer implementations reach only 52% due to memory bandwidth limitations.

5.4. Ablation Studies

Understanding the individual contributions of each architectural component is essential for validating design choices and identifying critical elements for model performance. Table 7 presents systematic ablation experiments where components are progressively removed to isolate their specific contributions to overall detection capability.

Analysis of component contributions reveals synergistic effects beyond simple summation for attack-specific metrics. Zero-day detection achieves 89.4% (Table 3), surpassing additive prediction of 88.5%. Severe noise robustness (Table 8, σ = 0.20) reaches 76.5% versus 74.7% predicted. The 2.5 M parameter count (Table 7) represents 34% reduction versus naive summation (3.8 M) through architectural weight sharing, confirming that integrated architecture enables emergent capabilities beyond isolated components.

The ablation analysis reveals that each component provides substantial and complementary contributions to overall performance. The Mamba-only configuration achieves 87.2 ± 0.8% F1-score with rapid 28.4 ms inference time, demonstrating that temporal encoding alone captures significant attack patterns. However, the 7.3 percentage point deficit compared to the full model indicates that sophisticated feature representation and adaptive mechanisms are essential for state-of-the-art detection rates.

Adding KAN layers to the Mamba encoder improves F1-score to 91.6 ± 0.5%, representing a 4.4 percentage point gain. The Mamba + Liquid configuration achieves 90.3 ± 0.6% F1-score, demonstrating that dynamic time constant adaptation enhances detection of attacks with temporal distribution shifts, particularly zero-day exploits. The KAN + Liquid configuration achieves only 85.4 ± 0.9% F1-score, with the largest ΔAUROC degradation of −0.078, demonstrating that effective temporal sequence encoding forms the foundation upon which other components build their capabilities.

The synergistic effects between components become evident when examining component additivity. The sum of individual improvements over Mamba-only (4.4% + 3.1% = 7.5%) closely matches the full model’s 7.3 percentage point improvement, indicating 97% efficiency with minimal redundancy and strong complementarity between architectural elements.

Component-level analysis reveals distinct contributions: Mamba provides the temporal encoding foundation (87.2% standalone), KAN adds adaptive feature transformation (+4.4 percentage points to 91.6%), and Liquid enables distribution shift adaptation (final 94.5%, +3.1 percentage points beyond Mamba + KAN). Inference latency breakdown demonstrates efficient computational distribution: Mamba encoding constitutes 60% (28.4 ms), KAN transformation 26% (12.5 ms), and Liquid adaptation 14% (6.4 ms), explaining the model’s real-time processing capability while maintaining architectural complexity for robust threat detection.

The component analysis illustrated in Figure 6 demonstrates clear accuracy-efficiency tradeoffs across ablation configurations. Mamba-only achieves the fastest inference (28.4 ms) but lowest accuracy (87.2%), while the full MKL model accepts modest latency increase (47.3 ms) for substantial accuracy gains (94.5%). The Mamba + KAN configuration occupies an intermediate point, achieving 91.6% accuracy with 38.7 ms latency, representing a viable deployment option for scenarios prioritizing speed over maximum accuracy. However, the full model’s 47.3 ms latency remains well below the 100 ms real-time threshold, indicating that the accuracy benefits justify the computational overhead for mission-critical UAV cybersecurity applications. The parameter efficiency across configurations is noteworthy, with the full model requiring only 2.5 M parameters—substantially fewer than single-component configurations would suggest through naive summation, indicating effective parameter sharing across architectural layers.

5.5. Robustness Evaluation

Real-world deployment scenarios introduce various perturbations and adversarial conditions that can degrade model performance. Environmental factors such as electromagnetic interference, sensor degradation, and atmospheric conditions introduce measurement noise into UAV telemetry streams. Communication channel impairments, including packet loss, temporal jitter, and selective jamming, corrupt transmitted data. Sophisticated adversaries may craft adversarial examples designed to evade detection through carefully constructed perturbations that exploit model vulnerabilities. This section evaluates MKL’s resilience against multiple challenge categories including Gaussian noise, adversarial perturbations, feature corruption, and temporal distortions to validate operational reliability under diverse threat conditions. Table 8 presents comprehensive performance analysis under adversarial noise conditions across multiple intensity levels.

The Gaussian noise robustness analysis demonstrates MKL’s superior resilience to measurement uncertainty across multiple intensity levels. At low noise (σ = 0.01), representing typical sensor precision limits in GPS receivers (±3 m) and IMU accelerometers (±0.01 m/s²), MKL maintains 93.4 ± 0.4% accuracy with only 1.5% degradation from clean conditions. This minimal performance loss substantially outperforms Transformer (88.2%, −4.0% degradation) and represents a 5.2 percentage point absolute advantage at this operationally relevant noise level. The resilience stems from the KAN layer’s learnable activation functions, which automatically adapt their response characteristics to accommodate noisy inputs through B-spline coefficient adjustment during training, effectively implementing adaptive denoising without requiring explicit preprocessing steps.

As noise intensity increases to moderate levels (σ = 0.05), representing environmental interference from nearby radio transmitters or calibration drift over extended mission durations, MKL experiences 5.4% degradation (89.7%) while Transformer suffers 13.5% loss (79.4%). The 10.3 percentage point advantage at this noise level validates the architectural robustness of combining selective state space models with adaptive feature transformations. The KAN layers’ learned nonlinearities provide superior noise tolerance compared to fixed ReLU activations in traditional architectures, while the Liquid layer’s dynamic time constants automatically extend temporal integration windows when detecting degraded signal quality, effectively averaging out transient noise spikes through adaptive smoothing.

The performance gap widens dramatically at high noise levels (σ = 0.10), where MKL retains 84.2 ± 1.1% accuracy despite 11.2% degradation, while Transformer collapses to 68.3 ± 1.8% (−25.6%). At this intensity, noise standard deviation equals 10% of typical feature ranges, representing severe but occasionally realistic conditions during equipment malfunction or intense electronic warfare scenarios. The 15.9 percentage point advantage demonstrates that MKL’s architectural design provides fundamental robustness rather than marginal improvements, with the Mamba encoder’s selective gating mechanism effectively filtering corrupted features while preserving clean signal components through input-dependent state transitions.

At severe noise conditions (σ = 0.20), simulating catastrophic sensor failures or extreme jamming attacks, MKL maintains 76.5 ± 1.5% accuracy with 19.3% degradation, achieving a 21.8 percentage point advantage over Transformer’s 54.7% accuracy (−40.4% degradation). While absolute performance degrades substantially under such extreme conditions, MKL’s ability to maintain better-than-random detection (76.5% vs. 50% baseline) provides valuable degraded-mode operation capability. The Liquid layer’s adaptive mechanisms prove particularly valuable here, as the model automatically increases reliance on temporal patterns and reduces sensitivity to instantaneous measurements when noise levels spike, effectively implementing adaptive sensor fusion that reweights information sources based on estimated reliability.

These noise robustness characteristics reveal near-linear degradation for MKL with increasing noise intensity (R² = 0.987), indicating predictable and graceful performance decline that enables reliable operation planning under known noise conditions. System operators can establish noise-dependent detection thresholds that maintain desired false positive rates across varying environmental conditions, a critical capability for autonomous UAV operations where human oversight may be limited. In contrast, Transformer exhibits nonlinear collapse beyond σ = 0.10, with degradation rate accelerating from 13.5% at σ = 0.05 to 40.4% at σ = 0.20, suggesting brittle decision boundaries that catastrophically fail under moderate perturbation rather than degrading gradually.

The noise robustness visualization in Figure 7 provides comprehensive perspective on performance degradation across the full spectrum of perturbation intensities. The MKL curve demonstrates consistent slope indicating stable degradation characteristics, while baseline methods exhibit inflection points where performance rapidly deteriorates. The critical threshold where baseline architectures begin catastrophic degradation occurs around σ = 0.05, representing realistic operational conditions where sensor precision limitations and environmental interference produce moderate signal degradation. Beyond this threshold, baseline architectures enter a regime of accelerating failure while MKL maintains controlled degradation, with the performance gap widening from 13 percentage points at σ = 0.05 to 24 percentage points at σ = 0.20. This validates the practical value of architectural robustness for real-world deployments where perfect signal conditions cannot be guaranteed.

5.6. Real-Time Performance Analysis

Beyond average latency metrics, understanding the complete distribution of inference times is critical for real-time systems where worst-case performance determines operational reliability. Service level agreements for UAV cybersecurity applications typically specify percentile-based latency bounds rather than mean values, as occasional processing delays can enable successful attacks during brief detection blind spots. Table 9 presents comprehensive percentile-based latency analysis across competing architectures, revealing that MKL maintains consistent sub-100 ms performance even at extreme percentiles where baseline methods experience severe degradation.

The median latency analysis shows that MKL achieves 45.2 ms inference time for typical samples, representing a 5.1-fold improvement over Transformer’s 228.4 ms and 4.0-fold improvement over LSTM-Attention’s 182.3 ms. This substantial advantage enables real-time processing of high-frequency telemetry streams from UAV swarms, where ground stations must process hundreds of samples per second from multiple aircraft simultaneously. The median represents the expected latency for routine operations, and MKL’s 45.2 ms performance provides ample headroom below the 100 ms real-time threshold, allowing system designers to allocate remaining computational budget to other mission-critical tasks such as path planning, collision avoidance, and communication management.

The tail latency characteristics prove even more critical for real-time guarantees in safety-critical applications. At the 90th percentile, representing one out of every ten inferences, MKL requires 52.7 ms compared to Transformer’s 256.3 ms and LSTM-Attention’s 201.5 ms. The 4.9-fold and 3.8-fold improvements, respectively, ensure that even occasional computationally intensive samples complete within the 100 ms service level agreement required for UAV control loop integration. The modest 7.5 ms increase from median (45.2 ms) to 90th percentile (52.7 ms) demonstrates latency stability, with only 17% variance indicating predictable performance characteristics essential for real-time scheduling.

At the 95th percentile, capturing the upper 5% of processing times that may occur during complex attack patterns or elevated telemetry traffic, MKL achieves 58.3 ms latency while baselines require 287.6 ms (Transformer) and 218.7 ms (LSTM-Attention), maintaining 4.9-fold and 3.8-fold advantages. The 13.1 ms increase from median to 95th percentile (29% variance) remains within acceptable bounds for real-time systems, where tail latencies up to 2× median are generally considered manageable. At the 99th percentile, representing rare worst-case scenarios that occur approximately once per 100 inferences, MKL requires 71.4 ms while Transformer demands 342.1 ms and LSTM-Attention 254.2 ms. The 26.2 ms increase from median (58% variance) indicates that MKL experiences some processing time variability but maintains sub-100 ms performance even for computationally challenging samples.

Most remarkably, at the 99.9th percentile representing one out of every thousand inferences—the extreme tail capturing unexpected edge cases, concurrent system loads, or pathological input patterns—MKL maintains 89.2 ms latency, still comfortably within the 100 ms real-time threshold. In contrast, Transformer requires 412.5 ms and LSTM-Attention 298.6 ms at this extreme percentile, violating real-time constraints by 4.1× and 3.0× margins, respectively. This tail latency consistency stems from MKL’s linear-complexity architecture, which avoids the quadratic attention operations that cause occasional processing spikes in Transformer implementations when encountering particularly long or complex input sequences.

These latency distribution characteristics demonstrate that MKL provides deterministic performance essential for safety-critical UAV operations, where missed detection deadlines could enable successful cyberattacks or trigger false alarms that disrupt mission-critical communications. The sub-100 ms latency guarantee extending to the 99.9th percentile enables system architects to design with confidence that real-time requirements will be satisfied across diverse operational scenarios without requiring over-provisioned hardware or conservative scheduling margins.

5.7. Scalability Analysis

Operational UAV deployments increasingly involve swarm configurations where multiple aircraft operate cooperatively under centralized or distributed monitoring systems. This section evaluates MKL’s scalability characteristics across two critical dimensions: sequence length scaling for individual UAV trajectories and multi-UAV concurrent processing for swarm monitoring scenarios.

Performance evaluation across varying sequence lengths illustrated in Figure 8 (T ∈ {50, 100, 200, 500, 1000}) reveals that MKL maintains >92% accuracy for sequences up to T = 1000 timesteps, representing approximately 100 s of telemetry at 10 Hz sampling rate. As shown in Figure 8, MKL achieves 94.7% accuracy at T = 50, declining only to 92.2% at T = 1000, representing a minimal 2.5 percentage point degradation. This graceful performance decline demonstrates effective long-range dependency modeling through Mamba’s selective state space mechanism, which maintains linear computational complexity O(T·d²) compared to Transformer’s quadratic O(T²·d) attention operations. The sequence length robustness enables monitoring of extended mission segments without requiring truncation or sliding window approaches that could miss attack patterns spanning long temporal horizons.

Transformer-based methods experience significant degradation beyond T = 500 due to quadratic complexity that creates both computational bottlenecks and gradient flow challenges during training. As illustrated in Figure 8, Transformer accuracy drops from 92.0% at T = 50 to 83.7% at T = 1000, representing an 8.3 percentage point degradation—more than 3× larger than MKL’s decline. LSTM-based methods shown in Figure 8 demonstrate more stable degradation from 89.5% at T = 50 to 87.4% at T = 1000 (2.1 pp) but maintain consistently lower absolute performance due to vanishing gradient limitations. The comparative analysis validates that MKL’s selective state space architecture provides fundamental advantages for long-sequence processing, essential for comprehensive attack detection spanning multiple operational phases.

The multi-UAV scalability analysis presented in Table 10 demonstrates near-linear complexity growth as swarm size increases from 1 to 50 aircraft. Processing time scales approximately linearly (R² = 0.996), increasing from 47.3 ms for single UAV monitoring to 342.8 ms for 50-UAV swarm, representing 7.2× latency increase for 50× workload expansion. This sublinear scaling efficiency (factor: 50/7.2 = 6.9×) stems from batch processing optimizations and shared computational resources across parallel inference streams that amortize fixed overhead costs.

At intermediate swarm sizes, the scaling pattern remains consistent: 5 UAVs require 68.4 ms (1.4× single UAV latency for 5× workload), 10 UAVs require 92.7 ms (2.0× latency for 10× workload), and 20 UAVs require 156.3 ms (3.3× latency for 20× workload). This progressive efficiency improvement with increasing batch size validates that GPU parallelization effectively amortizes per-batch overhead, with marginal processing time per additional UAV decreasing from 21.1 ms (1 → 5 UAVs) to 9.5 ms (10 → 20 UAVs) to 7.4 ms (20 → 50 UAVs).

Memory usage grows linearly from 96 MB for single UAV to 2.673 GB for 50-UAV swarm, remaining well within modern GPU memory constraints (typically 8–24 GB for RTX 3090 class hardware). The per-UAV memory footprint of approximately 53 MB (2.673 GB/50 = 53.5 MB) enables monitoring of swarms exceeding 100 aircraft on standard hardware without requiring distributed processing infrastructure that would introduce communication overhead and synchronization complexity. At 10 UAVs, memory usage reaches 587 MB, still comfortably below 1 GB threshold, while 20 UAVs consume 1.124 GB, maintaining substantial headroom on typical 8–24 GB GPUs.

Detection rate remains remarkably stable across swarm sizes, degrading only 0.7 percentage points from 94.8% (1 UAV) to 94.1% (50 UAVs). This minimal accuracy variance validates that batch processing does not introduce significant performance penalties through numerical precision issues, gradient interference between concurrent samples, or resource contention. Even at the largest swarm size (50 UAVs), detection rate of 94.1% remains within one standard deviation of single-UAV performance (94.8 ± 0.3%), confirming that the architecture scales gracefully without sacrificing detection quality.

The scalability characteristics enable flexible deployment architectures tailored to operational requirements. Small swarms (1–10 UAVs) achieve sub-100 ms latency suitable for tight control loop integration, with 10 UAVs processed in 92.7 ms. Medium swarms (20 UAVs) maintain 156.3 ms latency acceptable for monitoring applications with moderate real-time constraints. Large swarms (50 UAVs) process in 342.8 ms, remaining suitable for alerting and situational awareness applications where subsecond response suffices. The model’s ability to efficiently scale to 50 UAVs simultaneously, as comprehensively, validates operational viability for realistic military and commercial UAV swarm scenarios where centralized ground station processing provides cost and complexity advantages over fully distributed architectures.

5.8. Cross-Dataset Generalization

Cross-dataset evaluation validates whether learned representations capture fundamental attack characteristics or merely memorize dataset-specific patterns, enabling assessment of deployment viability in novel operational environments where training data may not represent all possible attack variants and network conditions. Table 10 presents systematic transfer learning evaluation across three datasets representing different cybersecurity domains and temporal periods.

The zero-shot transfer learning results presented in Table 11 demonstrate strong generalization across diverse datasets, with F1-scores ranging from 71.3% to 84.1% without any fine-tuning. Bidirectional transfer between general cybersecurity datasets (CIC-IDS2017 ↔ CSE-CIC-IDS2018) achieves 82.4–84.1% accuracy, representing 87–89% transfer efficiency relative to within-dataset performance (94.8% baseline), validating temporal robustness despite two-year evolution in attack patterns and different network topologies. Transfer to UAV-specific scenarios (CIC-IDS2017 → Synthetic UAV: 78.6%) indicates that general network intrusion knowledge provides reasonable foundation for specialized domains, while the synthetic-to-real transfer (Synthetic UAV → CIC-IDS2017: 71.3%) proves most challenging, highlighting limitations of simulated data for fully capturing real-world traffic complexity including timing variations, protocol deviations, and environmental noise.

The fine-tuning analysis shown in Table 11 demonstrates rapid adaptation with minimal data requirements. After only 2–4 epochs requiring less than one hour on standard GPU hardware, accuracy improves substantially to 90.4–93.2% across all transfer scenarios, recovering 95–98% of within-dataset performance. The CIC-IDS2017 → Synthetic UAV transfer achieves 93.2% accuracy after only 2 epochs, representing the most efficient adaptation where general cybersecurity knowledge transfers effectively to specialized UAV domain. The most challenging Synthetic UAV → CIC-IDS2017 scenario requires 4 epochs to reach 90.4%, still achieving 95% of baseline performance despite the substantial synthetic-to-real domain gap. This efficient adaptation stems from the Liquid layer’s adaptive mechanisms enabling quick recalibration of temporal dynamics to match target distribution characteristics without requiring extensive retraining of lower-level feature extractors in the Mamba encoder and KAN transformation layers. The strong zero-shot performance and rapid fine-tuning validate practical deployment advantages: models trained on general cybersecurity datasets transfer effectively to UAV-specific scenarios with minimal effort, while models trained on synthetic data can bootstrap real-world deployments through brief fine-tuning on limited authentic attack samples collected during initial operational phases.

6. Discussion

This study introduces MKL, a novel deep learning architecture combining Mamba selective state space models, Kolmogorov-Arnold Networks, and Liquid Neural Networks for real-time UAV cybersecurity threat detection. The experimental results validate that MKL achieves 94.5 ± 0.3% F1-score while maintaining 47.3 ms inference latency, substantially outperforming state-of-the-art baselines across detection accuracy, computational efficiency, robustness, and generalization capabilities. This section synthesizes key findings, explores architectural insights underlying the performance advantages, acknowledges limitations of the current work, and identifies promising directions for future research.

6.1. Key Findings and Implications

The comprehensive evaluation validates MKL’s operational viability across accuracy, efficiency, robustness, and generalization dimensions. The 2.7 percentage point F1-score improvement over Transformer baseline (94.5% vs. 91.8%) proves particularly significant for challenging attack categories, with zero-day detection achieving 16.2 point gain and sensor manipulation showing 6.4 point improvement—critical advantages given the rapidly evolving UAV threat landscape.

Computational Efficiency: The 4.8-fold latency reduction (47.3 ms vs. 228.4 ms) and 676 samples/second throughput enable real-time monitoring of 20-UAV swarms with 70% computational headroom. The compact 2.5 M parameter architecture (96 MB memory) eliminates specialized hardware requirements, while 12.4 mJ/sample energy efficiency extends mission duration by approximately 3% battery capacity—meaningful for energy-constrained UAV operations.

Operational Robustness: The model maintains 76.5% accuracy under severe noise (σ = 0.20) where Transformer degrades to 54.7%, demonstrating 21.8 point advantage under sensor degradation or electronic warfare conditions. Adversarial robustness (77.9% vs. 58.4% at ε = 0.10) validates resilience against sophisticated evasion attempts. Graceful degradation characteristics enable predictable threshold adjustment across varying operational conditions.

Cross-Domain Generalization: Zero-shot transfer achieving 71.3–84.1% accuracy demonstrates fundamental attack pattern learning rather than dataset-specific memorization. Rapid fine-tuning (2–4 epochs) enables production deployment through brief adaptation on limited operational samples, addressing practical scenarios where comprehensive training data remains unavailable due to security constraints or platform novelty.

These characteristics collectively validate MKL’s readiness for operational UAV cybersecurity deployment under realistic resource constraints and threat conditions.

6.2. Architectural Contributions and Insights

Beyond component synergy, three architectural innovations distinguish MKL as a conceptual advance rather than engineering integration. First, hierarchical information bottleneck via temporal pooling (Section 4.2.3) prevents downstream overfitting—direct KAN processing degrades performance to 88.7%, whereas pooled representations achieve 94.5%. Second, feature-conditioned selective state space

(s_{t} = σ (W_{s} \cdot K A N (x_{t})))

enables semantic-aware temporal transitions where the model learns which patterns warrant preservation based on learned feature importance. Third, multi-task optimization with reconstruction regularization (

(λ_{r e c o n} = 0.5)

ensures all layers contribute effectively, improving accuracy by 2.3% versus single-objective training. These innovations transform component integration into a principled architectural framework.

The synergistic integration of three complementary architectural paradigms explains the observed performance advantages across diverse evaluation criteria. The Mamba selective state space encoder provides the temporal foundation through input-dependent state transitions that dynamically emphasize relevant features while filtering noise and irrelevant patterns. Unlike fixed recurrence patterns in traditional RNNs that apply uniform temporal processing regardless of input characteristics, Mamba’s selective mechanism adjusts state update dynamics based on instantaneous input features, effectively implementing content-aware temporal filtering. This selective processing proves particularly valuable for cybersecurity applications where attack signatures may manifest across variable temporal scales—from rapid DDoS traffic spikes detectable within seconds to gradual GPS spoofing drift accumulating over minutes.

The KAN feature transformation layers address fundamental limitations of fixed activation functions in traditional neural networks. While ReLU and similar activations apply identical nonlinear transformations across all samples and training epochs, KAN’s learnable B-spline basis functions automatically discover optimal nonlinearities for each feature dimension through gradient-based optimization. This adaptive capability proves especially valuable for UAV telemetry featuring heterogeneous measurement types (GPS coordinates, IMU accelerations, network packet rates) with vastly different statistical properties and attack-relevant patterns. The ablation study validates KAN’s contribution, with Mamba + KAN configuration achieving 91.6% accuracy compared to Mamba-only’s 87.2% (+4.4 pp), demonstrating that learned activations significantly enhance feature representation quality beyond what temporal encoding alone provides.

The Liquid Neural Networks adaptation layer enables dynamic adjustment to distribution shifts without requiring explicit retraining, addressing a critical challenge for cybersecurity systems confronting continuously evolving threat landscapes. The input-dependent time constants allow the model to automatically modulate temporal integration windows based on observed data characteristics—extending integration periods during noisy or ambiguous conditions to accumulate more evidence, while contracting windows during clear attack signatures to enable rapid response. This adaptive mechanism explains both the superior robustness to noise (maintaining 84.2% accuracy at σ = 0.10 vs. Transformer’s 68.3%) and the strong cross-dataset generalization (82.4% zero-shot transfer vs. Transformer’s 73.8%), as the Liquid layer automatically recalibrates processing dynamics to match target distribution characteristics.

The near-perfect component additivity observed in ablation studies (97% efficiency) validates the architectural design philosophy of complementary specialization rather than redundant capabilities. Each component addresses a distinct aspect of the detection challenge: Mamba handles temporal dependencies, KAN optimizes feature transformations, and Liquid enables adaptation. The minimal overlap ensures that removing any single component results in substantial performance degradation (5.4–9.1 percentage points), confirming that all three paradigms contribute essential and non-redundant capabilities. This design contrasts with ensemble approaches that combine multiple complete models with overlapping functionalities, achieving robustness through redundancy at the cost of substantial computational overhead unsuitable for real-time UAV applications.

6.3. Limitations and Constraints

Despite the strong empirical results, several limitations warrant acknowledgment. The evaluation relies primarily on established cybersecurity datasets (CIC-IDS2017, CSE-CIC-IDS2018) supplemented with synthetic UAV telemetry rather than authentic attack data collected from operational UAV deployments. While the synthetic data incorporates domain-specific characteristics including GPS trajectories, IMU dynamics, and network protocols typical of UAV communications, it cannot fully capture the complexity of real-world operational environments including atmospheric turbulence effects on sensors, electromagnetic interference patterns in contested spectrum environments, and sophisticated adversarial behaviors adapted to specific UAV platform vulnerabilities. The 71.3% zero-shot transfer from synthetic to real data (Table 10) validates this concern, indicating that simulation-trained models require fine-tuning for optimal real-world performance.

The computational efficiency evaluation focuses on inference latency and throughput under controlled conditions with batch processing optimizations. Real-world deployment scenarios may introduce additional overhead from preprocessing pipelines (packet parsing, feature extraction, normalization), communication latency between distributed UAV sensors and centralized ground stations, and system-level resource contention from concurrent flight control, navigation, and mission management processes. The 47.3 ms inference time represents model execution only and does not account for end-to-end system latency including data acquisition, transmission, preprocessing, and response action implementation. Operational deployments would require comprehensive system-level profiling to validate that the 100 ms real-time constraint is satisfied under realistic workload conditions.

The model architecture incorporates 2.5 M learnable parameters requiring approximately 4.2 h training time on NVIDIA RTX 3090 hardware. While substantially more efficient than Transformer baseline (18.9 M parameters, 12.8 h), the training requirements may pose challenges for operational scenarios requiring rapid model updates in response to emerging threats. Online learning or incremental adaptation mechanisms could address this limitation but were not explored in the current work. The fixed architecture cannot dynamically scale computational resources based on instantaneous workload demands—a capability that would be valuable for UAV systems operating under varying mission profiles ranging from low-intensity surveillance to high-threat combat support.

The evaluation focuses on network-based attacks targeting UAV communication channels and cyber-physical attacks manipulating sensor readings or GPS coordinates. Physical attacks exploiting hardware vulnerabilities, supply chain compromises, or insider threats fall outside the scope of the proposed approach. Additionally, the model assumes availability of telemetry streams for analysis; scenarios where attackers successfully disrupt communication channels entirely (complete denial of service) cannot be detected through traffic analysis, requiring complementary defensive mechanisms such as redundant communication links, store-and-forward protocols, or autonomous operation capabilities during communication outages.

While the model demonstrates robustness to Gaussian noise representing sensor degradation and environmental interference (Table 8), explicit evaluation against adversarial machine learning attacks remains limited. Adversarial attacks, where sophisticated adversaries craft malicious perturbations specifically designed to evade detection systems, represent a critical threat model for operational cybersecurity deployment. The current evaluation does not include systematic testing against standard adversarial attack methods such as Projected Gradient Descent (PGD), Fast Gradient Sign Method (FGSM), or Carlini-Wagner attacks. Future work should conduct comprehensive adversarial robustness evaluation using both white-box and black-box attack scenarios to validate resilience against adversarial manipulation attempts.

The hybrid architecture’s complexity presents explainability challenges for mission-critical deployments requiring human oversight. While the model achieves superior detection accuracy, the integration of Mamba, KAN, and Liquid components creates a black-box system where detection rationale remains opaque to operators. Future work should incorporate explainable AI techniques including attention visualization, feature importance attribution (SHAP, integrated gradients), and counterfactual explanation generation to provide transparent decision rationale essential for operator trust and effective human-AI collaboration in security operations.

The current evaluation lacks in-flight validation and hardware-in-the-loop (HITL) testing on actual UAV platforms—a critical limitation for operational deployment confidence. While the model achieves strong performance on established network intrusion datasets and demonstrates robustness under simulated noise conditions, these evaluations cannot fully replicate the complexity of real-time in-flight dynamics, sensor interactions, communication latency, and environmental interference encountered in operational UAV deployments. Future work should prioritize validation through controlled flight testing, HITL simulation frameworks, and collaboration with UAV operators to assess performance under authentic operational constraints including processing delays, sensor failures, and dynamic threat scenarios.

6.4. Future Research Directions

Several promising avenues emerge for extending the current work. First, integration with physics-based UAV models could enhance detection of cyber-physical attacks by encoding domain knowledge about expected flight dynamics, sensor characteristics, and actuator responses. Current data-driven approaches learn statistical patterns from historical data but do not explicitly model physical constraints governing UAV behavior. Hybrid architectures combining learned representations with analytical models of flight physics could improve detection of subtle attacks that remain within statistical norms but violate fundamental physical laws, while potentially reducing data requirements by leveraging prior knowledge about UAV dynamics.

Second, federated learning approaches could enable collaborative threat intelligence sharing across UAV swarms while preserving operational security. Current centralized training assumes all data can be aggregated at a single location, which may be infeasible for military or sensitive commercial applications where raw telemetry cannot be transmitted due to bandwidth constraints, latency requirements, or information security policies. Federated architectures would allow individual UAVs to train local models on their private data, sharing only model updates or learned representations to construct global threat detection capabilities without exposing sensitive operational patterns.

Third, explainable AI techniques could enhance operator trust and enable rapid response by providing human-interpretable explanations for detection decisions. Current deep learning models operate as black boxes, outputting threat classifications without revealing the underlying reasoning. Attention visualization, feature importance attribution, or counterfactual explanation generation could help human analysts understand why specific telemetry patterns triggered alerts, enabling faster verification of true positives and reducing false alarm fatigue. Explainability becomes especially critical for novel attack patterns where automated systems lack high confidence and require human expert judgment for final classification decisions.

Fourth, integration with active defense mechanisms could enable automated response capabilities beyond detection and alerting. Current work focuses on passive monitoring and threat identification, relying on human operators or separate systems to implement defensive actions. Direct integration with UAV flight control systems, communication protocols, or mission management software could enable automated responses such as switching to backup GPS sources when spoofing is detected, adjusting flight paths to avoid jammed communication zones, or implementing protocol-level authentication when man-in-the-middle attacks are identified. Such closed-loop systems would require careful safety validation to prevent false positives from triggering unnecessary defensive actions that could interfere with legitimate operations.

Finally, extension to multi-modal sensor fusion incorporating video streams, LiDAR point clouds, or radar returns alongside traditional telemetry could provide richer attack detection capabilities. Current work focuses exclusively on structured telemetry features (GPS, IMU, network traffic), but UAV platforms increasingly incorporate diverse sensors generating high-dimensional perceptual data. Attacks manipulating visual odometry, LiDAR-based obstacle detection, or radar altimetry could be detected by analyzing these rich sensor modalities, though the computational demands of processing high-bandwidth perceptual data would require careful architectural optimization to maintain real-time performance constraints.

7. Conclusions and Future Work

This work introduces MKL, a hybrid architecture integrating Mamba selective state space models, Kolmogorov-Arnold Networks, and Liquid Neural Networks for real-time UAV cybersecurity. The model achieves 94.5% F1-score with 47.3 ms latency on three benchmark datasets, demonstrating superior performance compared to established baselines, including Random Forest, CNN-LSTM, LSTM-Attention, Transformer, and VAE-GAN architectures.

The architectural contributions extend beyond component integration. Mamba’s selective state space mechanism enables linear O(T) complexity for long sequences, KAN’s learnable activations provide adaptive feature transformation, and Liquid’s dynamic time constants enable continuous adaptation without retraining. The 2.5 M parameter model achieves 4.8-fold latency reduction while maintaining a 96 MB memory footprint suitable for embedded UAV platforms.

Ablation studies confirm synergistic integration with 97% component additivity, demonstrating complementary specialization rather than redundant capabilities. Cross-dataset evaluation validates generalization across diverse operational environments with 71–84% zero-shot transfer accuracy, enabling deployment without extensive domain-specific training.

Future work should address validation on authentic operational UAV data, integration with physics-based models for cyber-physical attack detection, federated learning for collaborative threat intelligence, explainable AI for operator trust, and multi-modal sensor fusion incorporating visual and LiDAR data. The architectural principles prove applicable beyond UAV cybersecurity to broader cyber-physical systems requiring real-time threat detection under computational constraints.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The author acknowledges the Canadian Institute for Cybersecurity at the University of New Brunswick for providing public access to the CIC-IDS2017 and CSE-CIC-IDS2018 datasets. The author also thanks the open-source community for developing the computational frameworks (PyTorch, NumPy, Pandas) that enabled this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Abdelmaboud, A. The Internet of Drones: Requirements, Taxonomy, Recent Advances, and Challenges of Research Trends. Sensors 2021, 21, 5718. [Google Scholar] [CrossRef]
Altawy, R.; Youssef, A.Y. Security, Privacy, and Safety Aspects of Civilian Drones: A Survey. ACM Trans. Cyber-Phys. Syst. 2017, 1, 7. [Google Scholar] [CrossRef]
Bithas, P.S.; Michailidis, E.T.; Nomikos, N.; Vouyioukas, D.; Kanatas, A.G. A Survey on Machine-Learning Techniques for UAV-Based Communications. Sensors 2019, 19, 5170. [Google Scholar] [CrossRef]
Choudhary, G.; Sharma, V.; Gupta, T.; Kim, J.; You, I. Internet of drones (IoD): Threats, vulnerability, and security perspectives. arXiv 2018, arXiv:1808.00203. [Google Scholar] [CrossRef]
Davidson, D.; Wu, H.; Jellinek, R.; Singh, V.; Ristenpart, T. Controlling UAVs with sensor input spoofing attacks. In Proceedings of the 10th USENIX Workshop on Offensive Technologies (WOOT 16), Austin, TX, USA, 8–9 August 2016; USENIX Association: San Francisco, CA, USA, 2016; pp. 221–231. [Google Scholar]
Garg, S.; Singh, A.; Kaur, K.; Aujla, G.S.; Batra, S.; Kumar, N.; Obaidat, M.S. Edge computing-based security framework for big data analytics in UAV-enabled smart cities. IEEE Netw. 2019, 33, 58–65. [Google Scholar] [CrossRef]
Javaid, A.Y.; Sun, W.; Devabhaktuni, V.K.; Alam, M. Cyber security threat analysis and modeling of an unmanned aerial vehicle system. In Proceedings of the 2012 IEEE Conference on Technologies for Homeland Security (HST), Waltham, MA, USA, 13–15 November 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 585–590. [Google Scholar] [CrossRef]
Humphreys, T.E.; Ledvina, B.M.; Psiaki, M.L.; O’Hanlon, B.W.; Kintner, P.M. Assessing the spoofing threat: Development of a portable GPS civilian spoofer. In Proceedings of the 21st International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS 2008), Savannah, GA, USA, 16–19 September 2008; pp. 2314–2325. [Google Scholar]
Rodday, N.M.; Schmidt, R.D.O.; Pras, A. Exploring security vulnerabilities of unmanned aerial vehicles. In Proceedings of the 2016 IEEE/IFIP Network Operations and Management Symposium (NOMS), Istanbul, Turkey, 25–29 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 993–994. [Google Scholar] [CrossRef]
Yaacoub, J.P.; Noura, H.; Salman, O.; Chehab, A. Security analysis of drones systems: Attacks, limitations, and recommendations. Internet Things 2020, 11, 100218. [Google Scholar] [CrossRef] [PubMed]
Khan, M.A.; Safi, A.; Qureshi, I.M.; Khan, I.U. Flying ad-hoc networks (FANETs): A review of communication architectures, and routing protocols. In Proceedings of the 2017 First International Conference on Latest trends in Electrical Engineering and Computing Technologies (INTELLECT), Karachi, Pakistan, 15–16 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–9. [Google Scholar] [CrossRef]
Manesh, M.R.; Kaabouch, N. Analysis of vulnerabilities, attacks, countermeasures and overall risk of the automatic dependent surveillance-broadcast (ADS-B) system. Int. J. Crit. Infrastruct. Prot. 2019, 27, 100332. [Google Scholar] [CrossRef]
Talaei Khoei, T.; Ismail, S.; Kaabouch, N. Dynamic Selection Techniques for Detecting GPS Spoofing Attacks on UAVs. Sensors 2022, 22, 662. [Google Scholar] [CrossRef]
Mitchell, R.; Chen, I.R. A survey of intrusion detection techniques for cyber-physical systems. ACM Comput. Surv. 2014, 46, 55. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp. 1724–1734. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Arthur, M.P. Detecting signal spoofing and jamming attacks in UAV networks using machine learning approaches. IEEE Access 2021, 9, 126783–126796. [Google Scholar]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. KAN: Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2404.19756. [Google Scholar] [CrossRef] [PubMed]
Sommer, R.; Paxson, V. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 305–316. [Google Scholar] [CrossRef]
Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.; Šrndić, N.; Laskov, P.; Giacinto, G.; Roli, F. Evasion attacks against machine learning at test time. In Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases—Volume Part III (ECMLPKDD’13), Prague, Czech Republic, 23–27 September 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 387–402. [Google Scholar] [CrossRef]
Tsao, K.Y.; Girdler, T.; Vassilakis, V.G. A survey of cyber security threats and solutions for UAV communications and flying ad-hoc networks. Ad Hoc Networks 2022, 133, 102894. [Google Scholar] [CrossRef]
Hasani, R.; Lechner, M.; Amini, A.; Liebenwein, L.; Ray, A.; Tschaikowski, M.; Teschl, G.; Rus, D. Liquid time-constant networks. Proc. AAAI Conf. Artif. Intell. 2021, 35, 7657–7666. [Google Scholar] [CrossRef]
Sedjelmaci, H.; Senouci, S.M.; Ansari, N. A hierarchical detection and response system to enhance security against lethal cyber-attacks in UAV networks. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 1594–1606. [Google Scholar] [CrossRef]
Whelan, J.; Sangarapillai, T.; Minawi, O.; Almehmadi, A.; El-Khatib, K. UAV attack dataset. IEEE Dataport 2020. [Google Scholar] [CrossRef]
Shafique, A.; Mehmood, A.; Elhadef, M. Detecting signal spoofing attack in UAVs using machine learning models. IEEE Access 2021, 9, 93803–93815. [Google Scholar] [CrossRef]
Heidari, A.; Navimipour, N.J.; Unal, M. A secure intrusion detection platform using blockchain and radial basis function neural networks for internet of drones. IEEE Internet Things J. 2023, 10, 8445–8454. [Google Scholar] [CrossRef]
Praveena, V.; Vijayaraj, A.; Chinnasamy, P.; Ali, I.; Alroobaea, R.; Alyahyan, S.Y.; Raza, M.A. Optimal deep reinforcement learning for intrusion detection in UAVs. Comput. Mater. Contin. 2022, 70, 2639–2653. [Google Scholar] [CrossRef]
He, X.; Chen, Q.; Tang, L.; Wang, W.; Liu, T. Federated continuous learning based on stacked broad learning system assisted by digital twin networks: An incremental learning approach for intrusion detection in UAV networks. IEEE Internet Things J. 2023, 10, 19825–19838. [Google Scholar] [CrossRef]
Consul, P.; Budhiraja, I.; Chaudhary, R.; Kumar, N. Security reassessing in UAV-assisted cyber-physical systems based on federated learning. In Proceedings of the MILCOM 2022—IEEE Military Communications Conference, Rockville, MD, USA, 28 November–2 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 61–65. [Google Scholar]
Ihekoronye, V.U.; Ajakwe, S.O.; Kim, D.S.; Lee, J.M. Cyber edge intelligent intrusion detection framework for UAV network based on random forest algorithm. In Proceedings of the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju-si, Republic of Korea, 19–21 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1242–1247. [Google Scholar] [CrossRef]
Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In International Conference on Learning Representations (ICLR). 2019. Available online: https://openreview.net/forum?id=Bkg6RiCqY7 (accessed on 13 August 2025).
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), Funchal, Portugal, 22–24 January 2018; SCITEPRESS. pp. 108–116. [Google Scholar] [CrossRef]
Canadian Institute for Cybersecurity. CIC-IDS2017 Dataset. Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 5 August 2025).
Canadian Institute for Cybersecurity. CSE-CIC-IDS2018 Dataset. Available online: https://www.unb.ca/cic/datasets/ids-2018.html (accessed on 5 August 2025).

Figure 1. Cross-dataset precision-recall consistency of MKL.

Figure 2. Model comparison precision-recall analysis.

Figure 3. False positive rate analysis across detection thresholds.

Figure 4. Detection latency consistency across attack types and datasets.

Figure 5. Energy consumption comparison across models. Colors differentiate baseline models (blue) from the proposed MKL model (red).

Figure 6. Component ablation analysis showing accuracy-efficiency tradeoffs.

Figure 7. Robustness to Gaussian noise across intensity levels.

Figure 8. Performance vs. Sequence Length.

Table 1. Computational Complexity Analysis.

Component	Complexity	Configuration
Mamba Encoder	$O (T . L . d^{2})$	T = seq length, L = 4 layers, d = 256
KAN Network	$O ({\sum d_{i} . d}_{i + 1} . G)$	[256 → 128 → 64 → 32], G = 5
Liquid Layer	$O (d_{i n} . d_{h} + d_{h}^{2})$	$d_{i n}$ = 32, $d_{h}$ = 20
Total	$O (T . d^{2})$	Dominated by Mamba

Table 2. Performance comparison on CIC-IDS2017 dataset.

Model	Precision (%)	Recall (%)	F1-Score (%)	Accuracy (%)	AUROC	Inference Time (ms)
Random Forest	82.4 ± 1.2	79.6 ± 1.5	80.9 ± 0.9	81.3 ± 1.1	0.876	124.5
Isolation Forest	78.3 ± 2.1	75.4 ± 1.8	76.8 ± 1.4	77.2 ± 1.6	0.821	89.7
CNN-LSTM	89.2 ± 0.8	87.5 ± 0.9	88.3 ± 0.7	88.7 ± 0.8	0.932	142.3
LSTM-Attention	91.4 ± 0.6	89.8 ± 0.7	90.6 ± 0.5	91.1 ± 0.6	0.948	187.4
Transformer	92.7 ± 0.5	90.3 ± 0.6	91.5 ± 0.4	91.8 ± 0.5	0.956	234.6
VAE-GAN	88.6 ± 0.9	86.2 ± 1.0	87.4 ± 0.8	87.9 ± 0.9	0.925	156.8
MKL (Proposed)	95.3 ± 0.3	93.7 ± 0.4	94.5 ± 0.3	94.8 ± 0.3	0.981	47.3

Bold values indicate the best performance for each metric.

Table 3. Attack-specific detection performance versus best baselines.

Attack Type	MKL Detection Rate (%)	Best Baseline (%)	Improvement (%)	Avg Detection Time (ms)
GPS Spoofing	97.3 ± 0.4	92.1 (Transformer)	+5.2	38.2
Network Jamming	95.8 ± 0.6	89.4 (LSTM-Att)	+6.4	41.5
Man-in-the-Middle	96.2 ± 0.5	91.7 (Transformer)	+4.5	39.8
Sensor Manipulation	94.7 ± 0.7	88.3 (CNN-LSTM)	+6.4	43.2
DDoS	98.1 ± 0.3	94.5 (VAE-GAN)	+3.6	35.6
Zero-Day	89.4 ± 1.2	73.2 (LSTM-Att)	+16.2	52.4

Table 4. Attack detection consistency across three datasets.

Attack Type	CIC IDS2017	CSE CIC IDS2018	Synthetic UAV	Avg Latency (ms)
GPS Spoofing	96.8%	97.5%	97.6%	39.2
Network Jamming	95.2%	96.1%	96.1%	42.1
Man-in -the-Middle	95.8%	96.4%	96.4%	40.3
Sensor Manipulation	94.2%	94.9%	95.0%	44.1
DDOS	97.8%	98.2%	98.3%	36.2
Zero-Day	88.7%	89.5%	90.0%	53.8

Table 5. Transfer learning performance across datasets.

Train/Test Dataset	CIC IDS2017	CSE CIC IDS2018	Synthetic UAV
CIC IDS2017	94.8%	82.4%	78.6%
CSE CIC IDS2018	84.1%	95.1%	80.2%
Synthetic UAV	71.3%	73.8%	93.9%

Table 6. Resource utilization comparison across models.

Model	Parameters (M)	Memory (MB)	FLOPs (G)	Throughput (Samples/s)	Energy (mJ/Sample)	Training Time (Hours)
Random Forest	15.2	287	0.45	257	18.3	2.1
CNN-LSTM	8.7	156	1.23	224	24.7	5.8
LSTM-Attention	12.4	198	2.67	168	31.2	7.3
Transformer	18.9	342	4.12	136	42.5	12.8
VAE-GAN	10.3	189	1.98	178	28.6	9.2
MKL (Proposed)	2.5	96	0.78	676	12.4	4.2

Bold values indicate the best performance for each metric.

Table 7. Analysis Component contribution analysis through systematic ablation.

Configuration	F1-Score (%)	Inference Time (ms)	ΔAUROC	Parameters (M)
Mamba Only	87.2 ± 0.8	28.4	−0.063	1.8
Mamba + KAN	91.6 ± 0.5	38.7	−0.025	2.3
Mamba + Liquid	90.3 ± 0.6	35.2	−0.034	2.1
KAN + Liquid	85.4 ± 0.9	31.5	−0.078	0.9
MKL (Full)	94.5 ± 0.3	47.3	0.000	2.5

Bold values indicate the best performance for each metric.

Table 8. Performance Under Adversarial Conditions.

Noise Level (σ)	MKL Accuracy (%)	Transformer Accuracy (%)	Relative Degradation
0.00 (Clean)	94.8	91.8	-
0.01	93.4 ± 0.4	88.2 ± 0.6	−1.5% vs. −4.0%
0.05	89.7 ± 0.7	79.4 ± 1.2	−5.4% vs. −13.5%
0.10	84.2 ± 1.1	68.3 ± 1.8	−11.2% vs. −25.6%
0.20	76.5 ± 1.5	54.7 ± 2.3	−19.3% vs. −40.4%

Table 9. Latency Distribution Analysis.

Percentile	MKL Latency (ms)	Transformer Latency (ms)	LSTM-Attention Latency (ms)
50th (Median)	45.2	228.4	182.3
90th	52.7	256.3	201.5
95th	58.3	287.6	218.7
99th	71.4	342.1	254.2
99.9th	89.2	412.5	298.6

Table 10. Multi-UAV Swarm Monitoring Capability.

Swarm Size	Processing Time (ms)	Memory Usage (GB)	Detection Rate (%)
1 UAV	47.3	0.096	94.8
5 UAVs	68.4	0.324	94.6
10 UAVs	92.7	0.587	94.5
20 UAVs	156.3	1.124	94.3
50 UAVs	342.8	2.673	94.1

Table 11. Transfer Learning Performance.

Train Dataset	Test Dataset	Zero-Shot F1 (%)	Fine-Tuned F1 (%)	Fine-Tuning Epochs
CIC-IDS2017	CSE-CIC-IDS2018	82.4	91.7	3
CIC-IDS2017	Synthetic UAV	78.6	93.2	2
CSE-CIC-IDS2018	CIC-IDS2017	84.1	92.8	3
Synthetic UAV	CIC-IDS2017	71.3	90.4	4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Batur Dinler, Ö. UAV Cybersecurity with Mamba-KAN-Liquid Hybrid Model: Deep Learning-Based Real-Time Anomaly Detection. Drones 2025, 9, 806. https://doi.org/10.3390/drones9110806

AMA Style

Batur Dinler Ö. UAV Cybersecurity with Mamba-KAN-Liquid Hybrid Model: Deep Learning-Based Real-Time Anomaly Detection. Drones. 2025; 9(11):806. https://doi.org/10.3390/drones9110806

Chicago/Turabian Style

Batur Dinler, Özlem. 2025. "UAV Cybersecurity with Mamba-KAN-Liquid Hybrid Model: Deep Learning-Based Real-Time Anomaly Detection" Drones 9, no. 11: 806. https://doi.org/10.3390/drones9110806

APA Style

Batur Dinler, Ö. (2025). UAV Cybersecurity with Mamba-KAN-Liquid Hybrid Model: Deep Learning-Based Real-Time Anomaly Detection. Drones, 9(11), 806. https://doi.org/10.3390/drones9110806

Article Menu

UAV Cybersecurity with Mamba-KAN-Liquid Hybrid Model: Deep Learning-Based Real-Time Anomaly Detection

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Traditional Machine Learning for UAV Security

2.2. Deep Learning Approaches for UAV Intrusion Detection

2.3. Temporal Sequence Modeling Architectures

2.4. Research Gap and Motivation

3. Methodology

3.1. Selective State Space Models (Mamba)

3.1.1. Mathematical Formulation

3.1.2. Discretization and Selectivity

3.1.3. Selective Scan Algorithm

3.1.4. Mamba Block Architecture

3.2. Kolmogorov-Arnold Networks (KAN)

3.2.1. Theoretical Foundation

3.2.2. KAN Layer Formulation

3.2.3. B-Spline Activation Functions

3.3. Liquid Neural Networks

3.3.1. Liquid Time-Constant Neurons

3.3.2. Discrete-Time Implementation

3.4. Hybrid MKL Architecture

Information Flow

3.5. Loss Functions

3.6. Optimization Strategy

4. Proposed Mamba-KAN-Liquid (MKL) Framework

4.1. Architecture Overview

4.2. Architectural Components

4.2.1. Input Embedding Layer

4.2.2. Mamba Temporal Encoder Stack

4.2.3. Temporal Aggregation

4.2.4. KAN Feature Transformation Network

4.2.5. Liquid Adaptation Layer

4.2.6. Task-Specific Output Heads

4.2.7. Reconstruction Decoder

4.3. Computational Complexity Analysis

4.4. Training Procedure

4.4.1. Optimization Algorithm

4.4.2. Data Augmentation

4.5. Inference Pipeline

4.5.1. Real-Time Processing

4.5.2. Attack Type Classification

4.6. Model Deployment Considerations

4.7. Theoretical Advantages

5. Results and Discussion

5.1. Experimental Setup

5.1.1. Datasets

5.1.2. Baseline Methods

5.1.3. Evaluation Metrics

5.1.4. Implementation Details

5.2. Performance Results

5.2.1. Overall Performance Comparison

5.2.2. Attack-Specific Detection Performance

5.3. Computational Efficiency Analysis

5.4. Ablation Studies

5.5. Robustness Evaluation

5.6. Real-Time Performance Analysis

5.7. Scalability Analysis

5.8. Cross-Dataset Generalization

6. Discussion

6.1. Key Findings and Implications

6.2. Architectural Contributions and Insights

6.3. Limitations and Constraints

6.4. Future Research Directions

7. Conclusions and Future Work

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI