Knowledge-Enhanced Zero-Shot Graph Learning-Based Mobile Application Identification

Zhang, Dongfang; Huang, Jianan; Tian, Manjun; Guan, Lei

doi:10.3390/electronics15010126

Open AccessArticle

Knowledge-Enhanced Zero-Shot Graph Learning-Based Mobile Application Identification

¹

The First Research Institute of the Ministry of Public Security, Beijing 100048, China

²

School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(1), 126; https://doi.org/10.3390/electronics15010126 (registering DOI)

Submission received: 25 November 2025 / Revised: 14 December 2025 / Accepted: 24 December 2025 / Published: 26 December 2025

(This article belongs to the Special Issue Novel Methods Applied to Security and Privacy Problems, Volume II)

Download

Browse Figures

Versions Notes

Abstract

With the proliferation of mobile devices, identifying previously unseen mobile applications has become a critical challenge in network security. Traditional application identification approaches rely heavily on fixed training categories and limited traffic features, making them ineffective in real-world environments. To address this problem, we propose KZGNN, a knowledge-enhanced zero-shot graph neural network for mobile application identification. KZGNN first constructs a unified mobile application knowledge graph that integrates high-level semantic metadata with fine-grained network behavior, enabling structured representation of application characteristics. Building on this, KZGNN introduces a relation-aware dual-channel propagation mechanism that separates semantic relations and behavioral interactions into dedicated GNN pathways and adaptively fuses them through attention. To support zero-shot recognition, KZGNN projects node embeddings and category semantics into a shared embedding space, where a structure-preserving constraint maintains global semantic geometry and improves generalization to unseen categories. Experiments on a dataset of 160 mobile applications show that KZGNN outperforms nine state-of-the-art traffic classification baselines and achieves a 5.2% improvement in identifying unseen application categories, demonstrating its effectiveness for mobile application identification in zero-shot scenarios.

Keywords:

network traffic identification; knowledge enhancement; zero-shot learning; graph neural networks; machine learning

1. Introduction

The rapid proliferation of mobile applications has reshaped modern digital ecosystems, enabling pervasive connectivity and diverse service scenarios [1,2]. As of 2025, more than 8.9 million mobile applications have been released across global app stores, serving over 6 billion Internet users worldwide [3]. This unprecedented scale, while enriching user experience, also introduces increasingly complex security challenges. Newly emerging or previously unseen mobile applications often appear without prior knowledge, raising significant risks such as unauthorized data collection and privacy leakage [4]. Consequently, accurately identifying unknown mobile applications has become a critical and urgent problem in network security.

Existing mobile application identification approaches rely heavily on supervised machine learning and deep learning models, which extract statistical or sequential patterns from encrypted traffic [5]. These methods have reached high accuracy in closed-world settings, where all application categories are known beforehand. However, their performance deteriorates sharply when encountering unknown applications [6]. Two primary factors account for this limitation. First, the widespread use of encrypted protocols and third-party SDKs causes traffic patterns from different applications to converge, weakening the discriminative power of traditional handcrafted or learned features [7]. Second, conventional models lack mechanisms to leverage additional semantic knowledge, rendering them incapable of generalizing beyond categories seen during training. As a result, these models struggle to establish meaningful associations between known and unseen applications, limiting their practical applicability in real-world environments.

To address the fundamental challenges of generalization and openness, recent studies have explored unsupervised learning, semi-supervised learning [8], few-shot learning [9], and zero-shot learning [10] for traffic analysis. Among them, zero-shot learning offers a promising direction by leveraging semantic information such as textual descriptions or functional attributes to build transferable representations that bridge known and unknown application categories. Existing zero-shot learning-based encrypted traffic classification approaches, however, primarily exploit shallow semantic embeddings or unimodal representations [11]. They fall short in capturing deeper structural associations, particularly those reflecting the complex interplay between application semantics and network behaviors. Moreover, they rarely incorporate relational priors, leading to insufficient modeling of application-level interactions and limited robustness when facing fine-grained unseen categories.

Motivated by these limitations, we propose KZGNN, a knowledge-enhanced zero-shot graph neural network tailored for identifying previously unseen mobile applications. KZGNN begins by constructing a unified mobile application knowledge graph that integrates two complementary perspectives: an application semantic knowledge graph derived from metadata and a network behavior knowledge graph extracted from encrypted traffic sessions. The fused graph preserves entity, relation, and attribute correspondences across views, enabling the model to jointly exploit high-level semantic descriptions and fine-grained communication behaviors within a single relational structure. To effectively learn from this heterogeneous cross-view graph, KZGNN introduces a zero-shot graph neural network built upon two newly designed components. The first is a relation-aware dual-channel propagation mechanism that explicitly separates semantic relations from behavioral interactions, processing them through dedicated GNN pathways equipped with attribute-aware weighting. The outputs of the two pathways are then adaptively fused via an attention mechanism, allowing the model to dynamically balance static semantic cues with dynamic behavioral evidence. The second component is a structure-preserving zero-shot alignment module that projects node embeddings and category semantics into a shared embedding space while maintaining global semantic topology. This alignment ensures that learned representations remain discriminative and transferable, enabling accurate recognition of application categories never encountered during training.

The main contribution of this study can be outlined as follows:

(1): We present KZGNN, a knowledge-enhanced zero-shot graph learning framework for mobile application identification that supports recognition of categories absent during training. Its core novelty lies in constructing and exploiting a unified cross-view knowledge graph that tightly integrates semantic metadata with dynamic traffic behaviors through explicit relational alignment. KZGNN preserves structural dependencies across views and aligns heterogeneous representations at the entity, relation, and attribute levels. This cross-view relational foundation enables the model to reason jointly over semantic context and communication behavior, achieving robust generalization to previously unseen applications.
(2): We design a zero-shot graph neural network that incorporates two key algorithmic innovations: a relation-aware dual-channel propagation mechanism and a structure-preserving semantic alignment module. The dual-channel design separates semantic and behavioral relations into dedicated propagation pathways, applies edge attribute-aware weighting, and adaptively fuses complementary information through attention. The alignment module maps node embeddings and category semantics into a unified embedding space while preserving global semantic structure, yielding discriminative and transferable representations for previously unseen categories.
(3): We conduct a comprehensive evaluation on a real-world dataset comprising 160 mobile applications to validate the robustness and generalization capability of KZGNN. The classification results show that KZGNN consistently outperforms nine state-of-the-art baselines, achieving a 5.2% improvement in unknown application category identification accuracy. These results show that knowledge-enhanced graph modeling and zero-shot semantic alignment provide substantial benefits for recognizing emerging and previously unseen mobile applications.

The remainder of this paper is organized as follows. Section 2 reviews the literature on application identification and related traffic analysis techniques. Section 3 introduces the preliminary concepts, including the threat model and fundamental notions of knowledge graph construction. Section 4 presents the design and workflow of the proposed KZGNN framework. Section 5 reports the experimental setup, evaluation results, and comparative analysis. Section 6 concludes the paper with a summary of findings and directions for future research.

2. Related Work

Application identification is a key research topic in network security and can be broadly categorized into two directions: known application identification and unknown application identification.

2.1. Known Application Identification

Known application identification assumes that the training and testing sets share the same category space. Existing approaches primarily rely on network traffic analysis, using machine learning (ML) or deep learning (DL) models to classify flows based on extracted statistical or sequential features [12].

Early studies focused on handcrafted traffic features combined with traditional ML classifiers. Miskovic et al. introduced AppPrint [13], which utilized packet sizes, inter-arrival times, flow duration, and packet counts to generate distinctive flow fingerprints. Taylor et al. proposed AppScanner [14], leveraging packet length sequences, directionality, burst patterns, and additional statistical features to construct a 54-dimensional representation, further classified using Support Vector Machines (SVM) or Random Forest classifier (RF). Their subsequent work [15] incorporated timing information, burst/silence patterns, and n-grams of packet sizes, applying Hidden Markov Models (HMMs) to capture the sequential nature of encrypted traffic. Aceto et al. [16] extended this direction with multi-classification frameworks based on flow-level and time-related statistics combined with supervised ML models such as Random Forest and Naive Bayes. Zhai et al. [17] demonstrated that cellular network traffic patterns could also serve as reliable discriminators for application identification.

With the growth of deep learning, researchers shifted toward end-to-end feature extraction using neural networks. Rezaei et al. [18] exploited Long Short-Term Memory networks (LSTM) and Convolutional Neural Network (CNN) architectures to learn temporal and spatial representations directly from raw encrypted traffic. Aceto et al. [19] proposed MIMETIC, integrating flow-based and packet-based features and leveraging CNNs and LSTMs to enhance classification accuracy. Wang et al. developed App-Net [20,21,22], combining CNN-based payload representations with LSTM-based packet length sequences for improved encrypted traffic classification. Aceto et al. [23] further incorporated packet-level and flow-level contextual features to strengthen DL-based classification models. Shapira and Shavitt introduced FlowPic [24], transforming traffic into image-like representations to enable image-based DL models. Lin et al. [25] proposed ET-BERT, a transformer-based approach that learns contextualized datagram representations from large-scale unlabeled traffic.

More recently, Graph Neural Networks (GNNs) have been applied to capture structural dependencies in encrypted traffic. Shen et al. [26] represented packet-flow relationships as graphs, enabling GNNs to learn flow-level dependencies. Pham et al. proposed MappGraph [27], modeling flows as graph nodes and their interactions as edges. Jiang et al. [28] and Huoh et al. [29] similarly constructed flow-relationship graphs to enhance mobile app fingerprinting. Li et al. introduced FusionTC [30], integrating multiple flow-sequence representations into a unified graph framework. Marzani et al. [31] combined automata learning with GNN-based classifiers to capture dynamic behavioral transitions. Wang et al. proposed TrafficGCN [32], applying Graph Convolutional Network (GCN)-based propagation to learn structural and relational information from encrypted flows.

Despite their strong performance, these methods rely heavily on labeled data and assume closed-world conditions, limiting their ability to generalize to unseen applications. The rapid evolution of applications and the prevalence of shared libraries and encrypted protocols further exacerbate this limitation, motivating research on identifying unknown applications.

2.2. Unknown Application Identification

Unknown application identification introduces the challenge of a lack of labeled data for emerging or unseen applications. To address this, prior work has explored semi-supervised learning, unsupervised learning, zero-shot learning, and few-shot learning strategies.

Semi-supervised methods such as FlowPrint by Van Ede et al. [8] modeled flow-level statistical patterns and temporal dependencies to detect deviations belonging to unknown applications. Zero-shot learning methods aim to leverage auxiliary semantic information to characterize unseen categories. Hu et al. [10] introduced an attribute-based zero-shot learning model for encrypted traffic classification, mapping traffic features to semantic attribute spaces to infer unseen categories. Jiang et al. [33] proposed a zero-relabeling strategy to address concept drift caused by dynamic app behaviors, enabling models to adapt without requiring additional manual labeling.

Few-shot learning has been used to recognize rare or emerging applications with limited labeled data. Bovenzi et al. [9,34] applied meta-learning to adapt to new categories with minimal samples. Chen et al. [35] proposed an incremental learning approach that continuously updates the model to incorporate new applications without degrading performance on existing classes. Other studies addressed the open-world nature of mobile traffic. Zhao et al. [36] designed a multi-level classifier combining supervised and unsupervised techniques to distinguish known applications from unknown ones under varying traffic distributions. Li et al. [37,38] introduced fine-grained and packet-level fingerprinting methods for wireless traffic, improving robustness under open-world conditions.

Although these approaches advance unknown application identification, they generally suffer from limited semantic modeling, insufficient integration of external knowledge, and weak generalization to fine-grained unseen categories. This highlights the need for frameworks capable of leveraging both semantic knowledge and behavioral relationships to enhance zero-shot generalization.

3. Preliminary

This section outlines the fundamental concepts and assumptions underlying KZGNN. We first describe the threat model that defines the analyzer’s capabilities and accessible information, and then introduce the basic elements of the knowledge graph that support the construction of our unified representation.

3.1. Threat Model

Figure 1 illustrates the threat model considered in this work. A user interacts with various mobile applications on a mobile device, and the generated traffic is transmitted through the Internet to the corresponding application servers [39]. A network traffic analyzer, such as an ISP gateway, enterprise firewall, or monitoring middlebox, is located along the communication path and is capable of passively capturing encrypted traffic traces for analysis.

Because modern mobile applications widely adopt end-to-end encryption, the analyzer cannot access packet payloads or infer application semantics directly. As a result, classification relies solely on side-channel features extracted from encrypted traffic, including packet length sequences, inter-packet timing, burst structures, flow directions, and other spatio-temporal characteristics [40]. In addition to traffic data, the analyzer may also access publicly available application metadata, such as descriptions, categories, developer information, permission requirements, and version histories, which are published by official app stores (e.g., Google Play, Apple App Store) [41].

Under this threat model, we assume a realistic environment where the analyzer has no prior knowledge of which specific applications the user may use. The analyzer can only access encrypted traffic traces and optional public semantic metadata but cannot inspect payloads, modify traffic, or obtain device-level information. The objective is to infer the originating application category or determine that a traffic instance belongs to an unseen application class solely from encrypted traffic patterns and auxiliary semantic knowledge.

3.2. Basic Element of the Knowledge Graph

A knowledge graph is a structured representation of real-world information that organizes data into interconnected semantic units. Its foundation typically consists of three essential components (entities, relations, and attributes), which together provide a flexible and expressive framework for modeling heterogeneous information.

(1) Entities: Entities serve as the fundamental units of a knowledge graph, representing objects, concepts, or abstract items within a specific domain. Depending on the application scenario, entity types may include users, devices, resources, events, or any identifiable object. Formally, the entity set is denoted as

E = {e_{1}, e_{2}, \dots, e_{n}}

(1)

where each

e_{i}

corresponds to a unique entity instance.

(2) Relations: Relations describe the semantic links or interactions between entities. They capture structural dependencies, hierarchical associations, causal interactions, or any form of domain-specific connectivity. Relations are typically represented as directed edges and denoted as

R = {r_{1}, r_{2}, \dots, r_{m}}

(2)

where each

r_{j}

represents a distinct relation type. Relation triples of the form

(e_{i}, r_{j}, e_{k})

define how entities are interconnected and provide the structural backbone of the graph.

(3) Attributes: Attributes enrich entities with descriptive information. They are commonly expressed as key-value pairs, offering features such as names, numerical values, categorical labels, timestamps, or textual descriptions. For an entity

e_{i}

, its attribute set is represented as

A (e_{i}) = {(k_{1}, v_{1}), (k_{2}, v_{2}), \dots, (k_{p}, v_{p})}

(3)

where each key-value pair provides domain-relevant information that supports downstream learning tasks.

Together, these elements enable knowledge graphs to encode heterogeneous information in a structured, interpretable, and scalable manner. By organizing entities, relations, and attributes in a unified graph structure, knowledge graphs support complex reasoning, facilitate representation learning, and provide a powerful foundation for graph-based analytical models.

4. Proposed Method

This section presents the proposed KZGNN framework for zero-shot mobile application identification. We begin with an overview of the workflow, followed by detailed descriptions of the knowledge graph construction, cross-view fusion strategy, and the zero-shot graph neural network design.

4.1. Overview

Figure 2 illustrates the overall workflow of KZGNN, a knowledge-enhanced graph learning framework designed for zero-shot mobile application identification. The method proceeds through three stages: knowledge graph construction, cross-view graph fusion, and zero-shot graph neural network learning, which together transform heterogeneous semantic information and encrypted traffic behaviors into a unified relational representation that supports recognition of application categories absent during training.

In the first stage, two complementary knowledge graphs are constructed from distinct data sources. Publicly available application metadata is used to form an application semantic knowledge graph

G_{e}

, encoding descriptive attributes such as application descriptions, developers, categories, and permissions. In parallel, encrypted traffic traces are converted into a network behavior knowledge graph

G_{n}

, where session nodes, IP addresses, domains, and protocol entities collectively describe the behavioral footprint of applications in real network environments. These two graphs provide high-level semantic context and fine-grained behavioral patterns, respectively. The second stage merges

G_{e}

and

G_{n}

into a unified mobile application knowledge graph

G_{m}

. Through entity alignment and attribute integration, the internal relational structures of both graphs are preserved while cross-view links are introduced to bridge semantic attributes with their corresponding runtime behaviors. This fusion results in a coherent multi-view representation that captures how application semantics manifest in actual communication dynamics.

In the final stage, a zero-shot graph neural network is applied to

G_{m}

to learn node embeddings capable of generalizing to unseen application categories. The model integrates multimodal node features, a relation-aware dual-channel propagation mechanism, and a structure-preserving semantic alignment module. By jointly projecting node embeddings and category descriptions into a shared embedding space, KZGNN enables accurate prediction of application classes with no training samples. The detailed methodology for each stage is presented in the following subsections.

4.2. Mobile Application Knowledge Graph Construction

A knowledge graph offers a unified and structured representation that naturally supports graph-based learning. To fully leverage this capability, we construct a mobile application knowledge graph that integrates heterogeneous information from both semantic and behavioral domains. This unified representation enables the model to jointly utilize high-level semantic descriptions, such as application metadata, and fine-grained behavioral patterns derived from encrypted traffic. By combining the semantic knowledge graph with the network behavior knowledge graph into a coherent structure, the fused representation forms the foundation for graph learning and facilitates robust identification of unseen applications.

(1) Application Semantic Knowledge Graph: Mobile applications expose a diverse range of external semantic information through application store and platform descriptions. These sources include application metadata, developer identities, permission requirements, release histories, and other contextual attributes. It is worth noting that it is usually presented in unstructured or semi-structured textual formats, making it unsuitable for direct use in machine learning models. To transform these heterogeneous semantic resources into a structured form, we construct a semantic knowledge graph

G_{e} = (E_{e}, R_{e}, A_{e})

.

The construction process begins by extracting core entities from metadata, where applications, developers, and permissions are defined as primary semantic units and instantiated as graph nodes. Each entity is enriched with descriptive attributes, including textual descriptions, ratings, download counts, and release timestamps, to capture essential semantic cues for subsequent model learning. Metadata is collected from official sources such as Google Play and the Apple App Store, supplemented by developer websites when necessary. Relations are then derived by parsing structured metadata fields, yielding associations such as developer develops application and application requires permission, which collectively describe the semantic dependencies among entities. To ensure a consistent graph structure, attribute values are assigned directly from metadata entries, and string similarity-based normalization is applied to resolve naming inconsistencies and unify duplicated references to the same real-world objects.

Through this structured transformation, the resulting semantic knowledge graph provides a coherent and interpretable representation of mobile applications. It enables the model to leverage domain semantics more effectively and supports enhanced generalization, particularly in zero-shot scenarios where unseen application categories must be inferred from semantic descriptions alone.

(2) Network Behavior Knowledge Graph: In addition to semantic information, the network behavior of an application serves as a crucial and distinctive characteristic. While semantic metadata describes what an application is intended to do, its actual operational behavior is reflected in how it communicates with remote servers. These communication patterns, embedded in encrypted traffic, reveal structural cues such as session organization, endpoint interactions, protocol usage, and temporal flow dynamics. To capture these behavioral properties, we construct a network behavior knowledge graph

G_{n} = (E_{n}, R_{n}, A_{n})

, which transforms raw traffic into a structured graph representation suitable for relational modeling.

The construction begins with session identification, where traffic is segmented into communication sessions based on the standard five-tuple (source and destination IP addresses, ports, and protocol). Each session is treated as an independent behavioral unit and instantiated as a graph node. Additional entities, including IP addresses, domain names obtained through reverse DNS lookup, and protocol types, are extracted from traffic metadata and incorporated into the graph. Together, these entities represent the essential components of network interactions.

After entity extraction, we establish behavioral relations to capture how these components interact. Sessions are linked to the applications that generate them, reflecting the origin of the traffic. Each session is further connected to the protocol it uses, the IP address it communicates with, and the domain name associated with that IP. These relations collectively reconstruct the communication workflow of mobile applications, enabling the graph to encode both direct interactions and indirect structural dependencies.

Entities are then assigned fine-grained behavioral attributes. Session nodes record average packet lengths, average inter-packet delays, and durations. Protocol entities retain information such as protocol type, version, and encryption algorithms. IP and domain entities include supplementary information such as geolocation and organization. Together, these attributes provide a detailed behavioral profile that complements the semantic information represented in

G_{e}

.

Through this construction process, the network behavior knowledge graph converts traffic traces into an interpretable multi-entity structure that reveals the underlying communication patterns of mobile applications. When later integrated with the semantic knowledge graph, it forms a comprehensive foundation that supports both precise classification of known applications and effective generalization to previously unseen ones.

4.3. Cross-View Knowledge Graph Fusion

Although the application semantic knowledge graph and the network behavior knowledge graph provide complementary information, using either one in isolation is insufficient to capture the full spectrum of mobile application characteristics. To model the deeper interdependencies between semantic context and behavioral patterns, we integrate the semantic graph

G_{e}

and the behavior graph

G_{n}

into a unified mobile application knowledge graph

G_{m}

. The fused graph preserves the structural integrity of both subgraphs while introducing cross-view connections that explicitly link high-level application semantics with their corresponding runtime behavior through entity alignment and attribute alignment.

The fusion process begins with entity alignment, whose goal is to determine whether entities from the semantic graph

G_{e}

and the behavior graph

G_{n}

refer to the same underlying application component. Since the two graphs originate from heterogeneous data sources with different naming conventions and granularities, we first apply preprocessing steps to standardize field formats and normalize textual identifiers. Candidate entity pairs are then evaluated using string similarity distance to identify highly confident matches despite minor naming variations. To avoid introducing erroneous alignments caused by noisy or ambiguous metadata, entity matching is conducted in a conservative manner: only pairs whose similarity scores exceed a threshold are aligned, while uncertain or low-confidence matches are intentionally left unaligned rather than being forcibly merged. As a result, potential metadata noise primarily leads to sparser cross-view connectivity instead of incorrect relational links. Once corresponding entities are aligned, we establish cross-view relational links by connecting semantic relations in

G_{e}

with their behavioral counterparts in

G_{n}

. For instance, semantic relations describing an application’s category or permission requirements are linked to behavioral entities that encode its runtime interactions, including the sessions it generates and the network endpoints it contacts. These cross-view connections illuminate how high-level semantic properties manifest in concrete communication behaviors, forming continuous relational pathways that unify semantic context and behavioral evidence within a single graph structure.

Following entity alignment, the fusion process proceeds to attribute alignment. When both subgraphs provide the same attribute, inconsistencies may arise due to differences in update cycles or derivation methods. In such cases, values from authoritative metadata sources, such as official app stores, are prioritized to ensure reliability. Attributes unique to a single view are retained without modification, allowing each fused entity to incorporate both static semantic descriptions from

G_{e}

and dynamic behavioral characteristics from

G_{n}

. This produces a complete and context-rich feature representation that combines long-term semantic properties with operational behavior patterns observed in network traffic.

Through this two-stage alignment procedure, the resulting graph

G_{m}

forms a cohesive and cross-view representation that unifies semantic descriptions, structural dependencies, and behavioral dynamics within a single graph-based framework. By preserving the complementary strengths of the semantic and behavior subgraphs and tightly coupling them through aligned entities and cross-view relations,

G_{m}

offers a robust foundation for graph-based learning and substantially enhances the model’s ability to reason across both known and previously unseen mobile applications.

4.4. Zero-Shot Graph Neural Network Design

After constructing the unified mobile application knowledge graph

G_{m}

, we develop a zero-shot graph neural network capable of jointly modeling semantic-behavioral interactions and generalizing to previously unseen application categories. The proposed model builds upon four coupled components: multimodal node feature representation and cross-view alignment, relation-aware dual-channel message propagation, semantic alignment and structure-preserving embedding, and a unified multi-objective optimization for zero-shot learning. Together, these components enable the model to learn expressive node embeddings that remain semantically coherent, behaviorally grounded, and structurally stable, supporting zero-shot inference.

(1) Multimodal Node Representation and Cross-View Alignment: The learning process begins with constructing a feature vector

h_{m, i}

for each node

e_{m, i} \in E_{m}

, derived from the node’s textual, categorical, and numerical attributes. Textual attributes, such as application descriptions or permission semantics, are encoded into embeddings using the pretrained language model, yielding

h_{i (t)}

. Categorical attributes, including application categories and protocol types, are encoded as one-hot vectors

h_{i (c)}

. Numerical attributes, such as user ratings, download counts, packet sizes, and traffic statistics, are normalized to form

h_{i (n)}

. The node feature vector is constructed as

h_{m, i} = [h_{i (t)}, h_{i (c)}, h_{i (n)}]

(4)

Because the unified node set

E_{m}

merges entities originating from both the semantic-view graph

G_{e}

and the behavior-view graph

G_{n}

, directly combining their embeddings can inevitably lead to drift caused by modality inconsistencies. To mitigate this issue, we introduce a mutual information-based cross-view alignment mechanism. Specifically, we adopt the InfoNCE estimator, which is widely used for cross-view representation matching. For each aligned node pair

(e_{e, i}, e_{n, i})

from the semantic-view and behavior-view graphs, we treat their embeddings

(h_{e, i}, h_{n, i})

as a positive pair, while embeddings of mismatched nodes

(h_{e, i}, h_{n, j})

serve as negative pair. The mutual information can be simply calculated as:

L_{a l i g n}^{K G} = - \sum_{i = 1}^{| E_{m} |} \log \frac{e x p (s i m (h_{e, i}, h_{n, i}) / τ)}{\sum_{j = 1}^{| E_{m} |} e x p (s i m (h_{e, i}, h_{n, j}) / τ)}

where

s i m (\cdot, \cdot)

denotes cosine similarity and

τ

is a temperature parameter. The mutual information encourages consistent alignment between modality-specific embeddings while preventing drift caused by the heterogeneous semantic and behavioral views.

Let

H_{e}

and

H_{n}

denote the embedding matrices of semantic-view and behavior-view nodes, respectively. The alignment objective encourages consistency between matched entities across the two views and is defined as

L_{a l i g n}^{K G} = - I (H_{e}, H_{n})

(5)

where

I (\cdot, \cdot)

is the estimated mutual information. Maximizing this term promotes agreement between the two modality-specific representations, ensuring that nodes retain coherent semantics even after fusion.

(2) Relation-Aware Dual-Channel Message Propagation: To effectively handle the heterogeneous relational structure of

G_{m}

, we develop a relation-aware dual-channel message propagation mechanism based on Relational Graph Convolutional Networks (RGCNs). In a standard RGCN, the feature update for node

i

at layer

l + 1

is given by

h_{i}^{l + 1} = σ (\sum_{r \in R} \sum_{j \in N_{i}^{r}} \frac{w_{i j}^{r}}{c_{i, r}} W_{r}^{l} h_{j}^{l} + W_{0}^{l} h_{i}^{l})

(6)

where

R

is the set of relation types,

N_{i}^{r}

is the set of neighbors of node

i

connected via relation

r

,

W_{r}^{l}

and

W_{0}^{l}

are trainable weights,

c_{i, r}

is a normalization factor,

w_{i j}^{r}

is the edge weight between nodes

i

and

j

under relation

r

, and

σ (\cdot)

is an activation function.

To distinguish semantic relations from behavioral ones, message propagation is divided into two separate channels. The semantic channel aggregates messages only over semantic relations, capturing static contextual dependencies. The behavior channel aggregates messages over behavioral relations, modeling dynamic communication interactions. Let

h_{e, i}^{l + 1}

and

h_{n, i}^{l + 1}

denote the outputs of the two channels. Their contributions are fused through an attention mechanism:

h_{i}^{l + 1} = α_{i}^{l} h_{e, i}^{l + 1} + (1 - α_{i}^{l}) h_{n, i}^{l + 1}

(7)

where the attention coefficient

α_{i}^{l}

is computed as

α_{i}^{l} = s o f t m a x (W [h_{e, i}^{l + 1}, h_{n, i}^{l + 1}])

and

W [\cdot, \cdot]

is a learnable attention projection matrix. This adaptive mechanism enables the model to dynamically balance semantic and behavioral cues for each node.

To further enhance the discriminative power of message aggregation, each edge is assigned a learnable attribute-aware weight defined by

w_{i j}^{r} = σ (W_{a} a_{i j}^{r} + b_{a})

(8)

where

a_{i j}^{r}

is the attribute vector of edge

(i, j)

under relation

r

. This allows the model to emphasize informative edges while suppressing noise, improving overall structural modeling.

(3) Semantic Alignment and Structure-Preserving Embedding: Third, to enable zero-shot prediction, node embeddings must be aligned with category semantics in a shared embedding space. Let

s_{c}

denote the semantic embedding of category

c

, obtained by encoding its textual description using the pretrained language model, and

h_{i}

be the embedding of node

i

produced by the graph encoder. Because these representations originate from heterogeneous modalities, they reside in different vector spaces and cannot be directly compared. To address this, both embeddings are projected into a joint embedding space through two learnable linear transformations:

{\hat{h}}_{i} = W_{h} h_{i}, {\hat{s}}_{c} = W_{s} s_{c}

(9)

where

W_{h}

and

W_{s}

are trainable projection matrices. Classification is performed by computing similarity scores between each projected node representation

{\hat{h}}_{i}

and all projected category embeddings

{\hat{s}}_{c}

. Since unseen categories are encoded using the same semantic encoder and projected with

W_{s}

, the model can assign nodes to categories for which no training samples are available, enabling zero-shot recognition.

However, optimizing only for node-category similarity may distort the global semantic structure of the embedding space. A node embedding may align well with its immediate category description while deviating from the broader semantic topology defined by relationships among categories, reducing generalization to unseen classes. To preserve this global structure, we introduce a structure-preserving constraint.

Specifically, let

V_{c}

denote the set of nodes belonging to category

c

, and

h_{v}

be the embedding of node

v

. We compute a category embedding by averaging the embeddings of all nodes corresponding to that category:

{\bar{h}}_{c} = \frac{1}{| V_{c} |} \sum_{v \in V_{c}} h_{v}

(10)

The matrix of all category embeddings is then constructed as

H_{c} = [{\bar{h}}_{1}, {\bar{h}}_{2} \dots, {\bar{h}}_{C}]

. Next, we compute a semantic similarity matrix

S \in R^{C \times C}

, where each element

S_{i j}

is the cosine similarity between the textual semantic embeddings of categories

i

and

j

. This matrix reflects the intrinsic semantic relationships among categories based solely on their descriptions.

To ensure that the learned embedding space preserves these semantic relationships, we define the structure-preservation loss:

L_{s t r u c t} = {‖ H_{c} H_{c}^{T} - S ‖}_{F}^{2}

(11)

where

{‖ \cdot ‖}_{F}

denotes the Frobenius norm. This loss encourages the pairwise similarities between learned category embeddings to align with the semantic similarity structure derived from category descriptions, preventing semantic drift and improving the model’s ability to distinguish between unseen categories.

(4) Unified Multi-Objective Optimization for Zero-Shot Learning: To jointly optimize classification accuracy, semantic alignment, cross-view consistency, and structural coherence, we define a unified multi-objective loss function. The first component is the classification loss, which encourages correct predictions for seen categories. For a node

v

with ground-truth category

Y_{v}

, the loss is defined as:

L_{c l s} = - \log \frac{\exp (s i m (v, Y_{v}))}{\sum_{c \in C_{s e e n}} \exp (s i m (v, c))}

(12)

where

s i m (v, c)

denotes the similarity score between node

v

and category

c

, and

C_{s e e n}

is the set of categories observed during training.

To improve the separability between correct and incorrect categories, we introduce a semantic-margin alignment loss, which enforces a margin

m

between the similarity scores of the positive class and any negative class

c^{-} \neq Y_{v}

:

L_{a l i g n} = \max (0, m - s i m (v, Y_{v}) + s i m (v, c^{-}))

(13)

This encourages the model to push node embeddings closer to their correct semantic descriptions while maintaining a margin from incorrect ones, improving discriminability in the shared embedding space. In addition, the cross-view alignment loss

L_{a l i g n}^{K G}

maximizes the mutual information between semantic-view and behavior-view embeddings, promoting consistency across the two modalities. The structure-preservation loss

L_{s t r u c t}

constrains the global geometry of the embedding space to align with the semantic similarity matrix derived from category descriptions, preventing semantic drift and improving generalization to unseen categories. Combining all four components, the final training objective is:

L = L_{c l s} + λ_{1} L_{a l i g n} + λ_{2} L_{a l i g n}^{K G} + λ_{3} L_{s t r u c t}

(14)

where

λ_{1}

,

λ_{2}

, and

λ_{3}

are coefficients controlling the relative contribution of each loss term. This unified objective ensures that the learned node embeddings remain semantically meaningful, behaviorally coherent, and structurally aligned, ultimately enabling robust zero-shot identification of mobile applications.

5. Experiment Results and Analysis

This section evaluates the effectiveness of the proposed KZGNN framework through a series of experiments on captured real-world mobile application traffic. We present the experimental setup, compare KZGNN with state-of-the-art baselines, and analyze its performance under various zero-shot identification scenarios.

5.1. Experimental Setup

5.1.1. Dataset

To comprehensively evaluate the effectiveness of KZGNN, we construct a real-world mobile application dataset comprising 160 applications. Data collection is conducted over a six-month period (January–June 2025) across heterogeneous network environments to ensure diversity and realism. The dataset covers five major categories of mobile applications: social, communication, games, tools, and productivity. For each category, 32 popular applications are selected from the top-ranked apps in the Google Play Store based on download volume. This selection strategy ensures a balanced category distribution and broad coverage of mainstream mobile applications.

Five Android smartphones are used to generate traffic, reducing device-specific bias and improving representativeness. Each application is executed under three widely used network conditions (Wi-Fi, 4G, and 5G) to capture variations in traffic behavior across different access technologies. For every application, we collect 60 traffic samples, with 20 samples per network condition. Each sample is generated by mimicking realistic user interactions to better reflect actual usage patterns. To improve transparency and reproducibility, Table 1 lists representative application examples from each category. These applications are selected to illustrate the diversity of usage patterns covered by the dataset, while the full dataset contains 32 applications per category, totaling 160 applications.

5.1.2. Baselines for Performance Evaluation

We compare KZGNN with nine state-of-the-art mobile applications and encrypted traffic classification methods to provide a comprehensive evaluation of model performance.

DeNeTLang [42] segments flows into network units based on timing characteristics and converts each unit into symbolic representations by clustering statistical features such as packet sizes and inter-packet delays. These symbols are then used to construct a k-testable temporal language, where k-gram frequency distributions serve as traffic fingerprints. Finally, a lightweight MLP classifier operates on these fingerprints to perform mobile application identification.

AppScanner [14] constructs a 54-dimensional statistical feature vector from packet length sequences, including the mean, minimum, maximum, and standard deviation of upstream, downstream, and bidirectional traffic. These engineered statistical features serve as the core representation for its application identification.

FlowPrint [8] is a semi-supervised mobile application identification method. It identifies unfamiliar applications by discovering temporal correlations among destination-related features. These destination-level behavioral patterns allow FlowPrint to fingerprint both known and previously unseen applications.

ET-BERT [25] leverages large-scale unlabeled traffic to pre-train contextual byte-level representations using a BERT-based architecture. Through self-supervised learning, it captures deep semantic dependencies within packet payload bytes, and the resulting embeddings can be fine-tuned for downstream mobile application identification tasks.

TrafficFormer [43] also employs self-supervised pretraining on unlabeled encrypted traffic. It introduces masked burst modeling and multi-directional flow classification tasks to capture packet ordering and directional information during pretraining and further enhances robustness through randomized field augmentation during fine-tuning.

App-Net [20] adopts a multimodal architecture that combines payload-derived spatial features extracted by CNNs with packet length sequences modeled by LSTMs. Its hybrid design enables simultaneous learning of spatial and sequential patterns present in mobile traffic.

FG-Net [28] constructs a unified graph representation by integrating packet-level features with flow-level relational structures. It then applies graph neural networks to capture both intra-flow and inter-flow dependencies, enabling more expressive modeling of structural patterns within network traffic.

SmartDetector [44] builds semantic attribute matrices using packet size, direction, and inter-packet intervals. It employs contrastive learning with perturbation-driven augmentation to obtain invariant representations tailored for encrypted traffic analysis.

Attribute-ZSL [10] applies attribute-guided zero-shot learning to traffic classification. It builds a semantic attribute space for applications and aligns flow representations with these attributes through a joint embedding model. Flow patterns are encoded using temporal convolutions with channel attention, while a triplet-based alignment loss and a conditional generative mechanism jointly enhance semantic alignment and improve recognition of unseen application categories.

5.1.3. Performance Metrics

Following common practices in traffic classification, we evaluate model performance using four standard metrics: Precision (

P r e .

), Recall (

R e c .

), Accuracy (

A c c .

), and F1-score (

F 1

). These metrics are defined as:

\begin{matrix} P r e . = \frac{T P}{T P + F P}, R e c . = \frac{T P}{T P + F N}, \\ A c c . = \frac{T P + T N}{T P + T N + F P + F N}, F 1 = \frac{2 \times P r e . \times R e c .}{P r e . + R e c .} \end{matrix}

(15)

where

T P

,

F P

,

T N

, and

F N

denote true positives, false positives, true negatives, and false negatives, respectively.

To ensure statistical reliability, all experiments are conducted using five different random seeds, and the mean scores are reported as the final results. The dataset is split into training and testing sets at an 8:2 ratio for each application category. All experiments are performed on an Ubuntu 20.04 workstation equipped with an Intel Core i9-12900K CPU, 64 GB RAM, and an NVIDIA GeForce RTX 3090 GPU. For KZGNN, hyperparameters are selected based on empirical tuning: we employ the Adam optimizer with a learning rate of 0.001, ReLU activation, two graph convolution layers with 64 hidden units each, a maximum of 100 training epochs, and a batch size of 32. The weighting coefficients (

λ_{1}

,

λ_{2}

, and

λ_{3}

) for the multi-objective loss are set to 0.5. We set the temperature

τ = 0.1

and the semantic margin

m = 0.2

, following standard practice in contrastive learning and metric learning. For semantic encoding, we adopt Llama-3 as the pretrained language model to generate textual embeddings. For entity alignment, Levenshtein distance is employed as the string similarity metric to evaluate candidate entity pairs. For all baseline models, we adopt the recommended hyperparameters provided in their original implementations to ensure fair comparison.

5.2. Unknown Application-Aware Classification Experiments

To support unknown application identification, the model must first be capable of distinguishing traffic from known applications while correctly rejecting samples originating from unseen ones. This experiment evaluates that ability under an open-set setting. All applications are grouped by category, with 80% selected as known categories whose application-specific labels are used during training. The remaining 20% are treated as unknown categories, and all corresponding traffic samples are assigned a unified unknown label. Both training and testing sets contain a mixture of known and unknown samples with 8:2 ratio.

As shown in Figure 3, KZGNN achieves the best performance, reaching an average accuracy of 96.5%. Its advantage comes from the unified knowledge graph, which jointly models semantic attributes, relational structure, and edge-level signals, resulting in representations that effectively separate known applications while reliably rejecting unseen ones. Graph-based FG-Net performs well in this setting, but its lack of semantic embeddings and edge-attribute modeling limits its ability to discriminate unknown traffic, leading to a ∼3% performance gap compared with KZGNN. App-Net surpasses ET-BERT, reflecting the benefit of multimodal feature fusion when distinguishing known app patterns from unfamiliar traffic. FlowPrint outperforms AppScanner in this scenario because its use of richer flow-level features improves separation between known and unknown traffic profiles. TrafficFormer, although strong in modeling global sequential patterns, does not explicitly capture cross-flow semantic or relational cues, which are essential for identifying deviations introduced by unseen applications. SmartDetector performs comparatively well due to its semantic perturbation and contrastive learning strategy, which enhances robustness to distribution shifts. However, lacking explicit relational and knowledge-guided modeling, it still falls short of the representational capacity achieved by KZGNN. For the zero-shot scheme, Attribute-ZSL provides competitive performance and achieves a level comparable to TrafficFormer, indicating its capability to handle both known and unseen mobile applications under open-world conditions.

5.3. Unknown Application Type-Aware Classification Experiments

Building on the previous experiment, this task evaluates whether the model can infer the functional type of unseen applications. The goal is to determine whether traffic from an unknown application can be assigned to one of five categories (social, communication, games, tools, and productivity) based solely on knowledge learned from known categories. Applications are split at the category level: 80% of categories are used for training, while the remaining 20% appear only during testing. The training set contains only known categories, ensuring that the model learns generalizable type distinctions rather than any implicit clues from the unseen categories themselves.

As shown in Figure 4, KZGNN achieves the highest performance, with an average accuracy of 89.8%. Its advantage in this scenario stems from its unified knowledge graph, which explicitly models entity relationships, semantic attributes, and behavioral interactions. These graph-based structures enable the model to uncover latent semantic similarity patterns shared among applications of the same functional type, even when specific categories are unseen during training. Other methods achieve lower performance because they lack mechanisms for capturing such category semantic structure. Models relying solely on traffic patterns, achieving accuracies between 86.9% and 75.4%, struggle to correctly infer type-level semantics from unseen categories. TrafficFormer performs well on known categories due to its strong sequence modeling via multi-head attention but exhibits reduced accuracy in this task because it cannot leverage semantic priors to generalize beyond observed traffic distributions. SmartDetector benefits from contrastive augmentation and demonstrates higher robustness than other baselines, yet its representation space is limited by fixed-dimensional, single-flow modeling, making it less effective at transferring semantic relationships across categories. For Attribute-ZSL, its overall performance remains close to that of TrafficFormer, with slightly higher accuracy, suggesting good robustness when handling unknown application types. Unlike TrafficFormer, Attribute-ZSL shows a higher precision, reflecting a lower false-positive tendency and an emphasis on conservative, correctness-oriented traffic identification.

Overall, the results highlight that knowledge-enhanced semantic reasoning is crucial for inferring the functional types of unseen applications, and KZGNN’s structured semantic-behavioral modeling provides a clear advantage in this classification setting.

5.4. Unknown Application Label-Aware Classification Experiments

In this experiment, we further increase task difficulty by evaluating the ability of the model to identify, at a fine-grained level, specific classes of applications that have never appeared in the training set. Unlike the previous section, which focuses on coarse-grained application-type classification, this experiment assesses fine-grained category recognition under zero-shot conditions. Consistent with the unknown application type-aware classification experiments, the dataset is partitioned by application category, with 80% used as known categories for training and the remaining 20% serving as unknown categories during testing. The model needs to distinguishing among unseen application labels by relying solely on semantic patterns and relational information learned from the training categories.

Figure 5 shows that KZGNN maintains superior performance with an average accuracy of 85.2%. Although this is approximately 4.6% lower than the last classification due to the increased granularity of the task, the model still demonstrates strong generalization ability. Its advantage derives from the multi-level semantic, relational, and attribute modeling afforded by the knowledge graph, enabling finer discrimination across unseen application classes. FG-Net benefits from graph-based structures but lacks semantic embedding capabilities, resulting in lower performance compared to KZGNN. TrafficFormer performs well in closed-world settings but emphasizes temporal patterns over semantic differences, limiting its applicability for unseen application category discrimination. SmartDetector achieves an accuracy of 80.0%, showing resilience to distribution shifts through contrastive learning, but its inability to model cross-application semantic transfer constrains its generalization. It is worth noting that Attribute-ZSL exhibits substantially lower performance, achieving only around 75% accuracy and an F1-score approximately 10% below KZGNN. This indicates that Attribute-ZSL struggles under realistic and challenging zero-shot classification settings. The primary limitation lies in its semantic attribute space, which lacks the expressive capacity to support fine-grained alignment and thus fails to accurately discriminate specific application categories in the zero-shot scenario. Overall, KZGNN achieves a 5.2% improvement over existing methods by leveraging structured semantic reasoning and dual-view relational patterns.

To further analyze the performance of KZGNN, Figure 6 reports the category-wise classification results under the unknown application label-aware setting. Overall, the model maintains stable performance across all five categories, while clear differences emerge due to intrinsic category characteristics. Notably, Games achieves the highest scores across all metrics, benefiting from distinctive semantic descriptions and traffic patterns that remain discriminative even under zero-shot conditions.

In contrast, Social and Communication exhibit relatively lower performance, reflecting their highly overlapping functionalities and similar network behaviors, which increase ambiguity among unseen labels. Tools and Productivity show moderate and closely aligned results, as these categories often share event-driven usage patterns and limited semantic separation. Despite these challenges, KZGNN consistently preserves strong generalization ability across categories, demonstrating its effectiveness in fine-grained zero-shot application classification.

Additionally, we conduct a paired Wilcoxon signed-rank test across five random seeds using F1-score as the evaluation metric to assess the statistical significance of KZGNN’s performance gains over the strongest baselines. Table 2 reports the mean

F 1

improvements, the corresponding 95% confidence intervals (CI), and the Wilcoxon p-values when comparing KZGNN with FG-Net, SmartDetector, and TrafficFormer.

Across all comparisons, KZGNN consistently delivers substantial improvements, with mean

F 1

gains ranging from 3.24% to 4.82%. All confidence intervals remain strictly positive, indicating stable superiority across different seeds. Moreover, the Wilcoxon

p

-value of 0.031 (

p

< 0.05) confirms that these improvements are statistically significant.

5.5. Temporal Drift Experiments

As mobile applications are frequently updated and their network behaviors evolve over time, application identifications inevitably suffer from temporal drift. Such updates can introduce substantial behavioral changes, resulting in distribution shifts that degrade the performance of classifiers trained on historical data. To evaluate the robustness of KZGNN under temporal drift, we select five representative applications from each category, yielding a total of 25 applications. These applications are re-executed under the same data collection scenario, while their network traffic and corresponding metadata are recaptured after multiple version updates. All models are trained on the original dataset and directly evaluated on the recollected traces without retraining, following the same experimental setting as the unknown application label-aware classification task. The classification results are reported in Table 3.

Overall, most baseline methods experience severe performance degradation, with accuracy drops ranging from 20% to 35%, particularly for approaches that rely solely on traffic patterns, such as DeNeTLang, AppScanner, App-Net, FG-Net, and SmartDetector. Methods that incorporate more abstract structural or semantic modeling, including FlowPrint, ET-BERT, and TrafficFormer, exhibit relatively better robustness but still suffer notable accuracy loss. Although Attribute-ZSL introduces a zero-shot mechanism, its reliance on traffic pattern-based semantic alignment limits its ability to cope with evolving application behavior. In contrast, KZGNN achieves the highest accuracy of 60%, outperforming the second-best method by 12%, demonstrating significantly stronger robustness to temporal drift. This advantage stems from KZGNN’s ability to jointly leverage updated application metadata and traffic behavior through a unified knowledge graph, enabling more stable and transferable representations under evolving network conditions.

5.6. Ablation Experiments

To assess the contributions of the knowledge graph representation and zero-shot learning mechanism utilized in the KZGNN, we conduct ablation experiments considering three model variants: KZGNN w/o KG (the knowledge graph representation is removed), KZGNN w/o ZSL (the zero-shot learning mechanism is removed), and KZGNN w/o KG&ZSL (both the knowledge graph and zero-shot learning components are removed). Table 4 presents the ablation results under the unknown application label-aware classification experiments.

The results demonstrate that both components contribute substantially to model performance. Removing the knowledge graph leads to a 4.0% decrease in

F 1

, confirming that external semantic information and relational structures are crucial for modeling fine-grained differences between applications. Removing the zero-shot learning mechanism results in a 2.3% decrease in

F 1

, indicating that the shared embedding space is essential for generalizing to unseen categories. When both components are removed, performance drops to 77.2%, underscoring that the effectiveness of KZGNN arises from the combination of semantic knowledge modeling and zero-shot representation learning.

To further examine the sensitivity of KZGNN to different semantic encoders, we conduct an ablation study comparing five widely used pretrained language models for generating semantic embeddings: TF-IDF, BERT, RoBERTa, Sentence-BERT, and Llama-3. These embeddings are used directly without modifying other components of KZGNN. Table 5 summarizes the results.

The results show a clear trend. TF-IDF yields the weakest performance because its bag-of-words representation cannot capture semantic relations needed for zero-shot reasoning. BERT and RoBERTa improve substantially by leveraging contextual embeddings, achieving around 80% accuracy. Sentence-BERT performs slightly better due to its contrastive training objective, which enhances semantic similarity modeling. Llama-3 provides the highest overall performance among all compared encoders. Its broader pretraining corpus and stronger semantic abstraction ability yield more informative category embeddings, which directly benefit the zero-shot alignment process. Overall, this analysis indicates that KZGNN is compatible with various semantic encoders, but stronger pretrained language models provide clearer semantic structures, leading to more discriminative zero-shot identification.

Additionally, KZGNN is optimized using a multi-objective loss function that combines four components, where the hyperparameters

λ_{1}

,

λ_{2}

, and

λ_{3}

control the relative contributions of the auxiliary objectives. To evaluate the sensitivity of KZGNN to these weighting coefficients, we conduct a parameter sensitivity analysis by uniformly setting

λ_{1} = λ_{2} = λ_{3} = λ

, with

λ \in {0, 0.1, 0.2, . . ., 0.9}

. The corresponding F1-score under different values of

λ

are reported in Figure 7.

As shown in Figure 7, the classification performance consistently improves as

λ

increases from 0, rising from approximately 75% to over 80%. This trend indicates that introducing auxiliary objectives provides effective regularization and facilitates more fine-grained representation learning. The performance reaches a plateau when

λ

is around 0.5, after which it degrades noticeably. This decline suggests that overly emphasizing auxiliary constraints can dominate the primary classification objective, thereby weakening discriminative capability and negatively impacting overall performance.

5.7. Computing Efficiency Analysis

To further evaluate the computational efficiency and deployment feasibility of KZGNN, we conduct a computing efficiency analysis by measuring both training time and inference time overheads. Considering the heterogeneity of compared methods, including statistical classifiers, deep learning models, graph-based approaches, and zero-shot learning frameworks, we adopt two practical metrics: Average Training Time Consumption (ATC) and Average Inference Time Consumption (AIC), which measure the average per-sample time required during training and inference, respectively. All methods are evaluated under the same hardware and software environment to ensure fair comparison, and the results are summarized in Table 6.

The results reveal clear differences in computational characteristics across methods. Lightweight statistical models such as DeNeTLang and AppScanner achieve the lowest ATC and AIC, reflecting their limited modeling complexity but also their weaker representational capacity. Large pretrained models, including ET-BERT and TrafficFormer, incur substantially higher training and inference costs due to their reliance on deep sequence modeling and attention mechanisms. Graph-based FG-Net exhibits both high training cost and inference overhead, as relational message passing must be performed at test time. Attribute-ZSL introduces additional computation for semantic alignment, leading to moderate overhead. In contrast, KZGNN achieves a balanced trade-off between efficiency and performance: while its training cost is higher than shallow models due to cross-view graph fusion and semantic alignment, its inference overhead remains moderate and significantly lower than large pretrained traffic-only models. This demonstrates that KZGNN’s knowledge-enhanced graph modeling introduces manageable computational overhead while providing substantial gains in zero-shot identification accuracy, making it suitable for practical deployment in real-world network environments.

6. Conclusions

This work presents KZGNN, a zero-shot graph neural network designed to identify previously unseen mobile applications. By constructing a unified mobile application knowledge graph, KZGNN integrates dynamic network behavior patterns with static semantic attributes to enable expressive representation learning. A zero-shot learning model is further introduced to aggregate relational information and align node embeddings with category semantics, allowing the model to recognize unknown applications without requiring labeled samples during training. Experiments on a dataset of 160 mobile applications validate the effectiveness of the proposed framework. KZGNN consistently outperforms nine state-of-the-art traffic analysis methods, achieving 96.5% accuracy in distinguishing known from unknown categories and 85.2% accuracy in identifying unseen application categories, representing a 5.2% improvement over the second-best baseline.

Future work will extend in three directions. First, considering the rapid evolution of mobile applications and the temporal dynamics of their network behavior, we will investigate dynamic knowledge graph updates and incremental learning mechanisms to improve long-term adaptability. Second, we plan to study cross-domain generalization schemes, such as transfer learning and adversarial domain adaptation, across heterogeneous network protocols, device types, and execution environments to further enhance robustness in real-world deployments. Third, as metadata quality may vary across app markets or enterprise environments, we aim to explore strategies for handling incomplete or sparse metadata, including metadata augmentation and fallback representations, to broaden the applicability of our approach.

Author Contributions

Conceptualization, D.Z. and J.H.; methodology, D.Z.; software, D.Z., J.H. and M.T.; validation, D.Z., J.H., M.T. and L.G.; formal analysis, M.T. and L.G.; data curation, M.T. and L.G.; writing—original draft preparation, D.Z.; writing—review and editing, D.Z., J.H., M.T. and L.G.; visualization, M.T. and L.G.; supervision, J.H.; project administration, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tauqeer, M.; Gohar, M.; Koh, S.; Alquhayz, H. Use of QUIC for mobile-oriented future Internet. Electronics 2024, 13, 431. [Google Scholar] [CrossRef]
Thangavel, P.; Chandra, B. Two decades of M-commerce consumer research: A bibliometric analysis using R biblioshiny. Sustainability 2023, 15, 11835. [Google Scholar] [CrossRef]
ITU. Measuring Digital Development: Facts and Figures. 2025. Available online: https://www.itu.int/itu-d/reports/statistics/facts-figures-2025/ (accessed on 11 December 2025).
Azab, A.; Khasawneh, M.; Alrabaee, S.; Choo, K.; Sarsour, M. Network traffic classification: Techniques, datasets, and challenges. Digit. Commun. Netw. 2024, 10, 676–692. [Google Scholar] [CrossRef]
Lin, P.; Ye, K.; Hu, Y.; Lin, Y.; Xu, C. A novel multimodal deep learning framework for encrypted traffic classification. IEEE/ACM Trans. Net. 2023, 31, 1369–1384. [Google Scholar] [CrossRef]
Shen, M.; Ye, K.; Liu, X.; Zhu, L.; Kang, J.; Yu, S.; Li, Q.; Xu, K. Machine learning-powered encrypted network traffic analysis: A comprehensive survey. IEEE Commun. Surv. Tutor. 2023, 25, 791–824. [Google Scholar] [CrossRef]
Liu, Z.; Wei, Q.; Song, Q.; Duan, C. Fine-grained encrypted traffic classification using dual embedding and graph neural networks. Electronics 2025, 14, 778. [Google Scholar] [CrossRef]
Van, T.; Bortolameotti, R.; Continella, A.; Ren, J.; Dubois, D.; Lindorfer, M.; Choffnes, D.; Van, M.; Peter, A. Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 23–26 February 2020; Volume 27. [Google Scholar]
Bovenzi, G.; Monda, D.; Montieri, A.; Persico, V.; Pescapé, A. Few shot learning approaches for classifying rare mobile-app encrypted traffic samples. In Proceedings of the IEEE Conference on Computer Communications Workshops, Hoboken, NJ, USA, 20 May 2023; pp. 1–6. [Google Scholar]
Hu, Y.; Cheng, G.; Chen, W.; Jiang, B. Attribute-based zero-shot learning for encrypted traffic classification. IEEE Trans. Netw. Serv. Manag. 2022, 19, 4583–4599. [Google Scholar] [CrossRef]
Yang, C.; Gu, Z.; Bai, J.; Li, Z.; Xiong, G.; Gou, G.; Yao, S.; Chen, X. Few-shot encrypted traffic classification: A survey. In Proceedings of the Asia-Pacific Conference on Image Processing, Electronics and Computers, Dalian, China, 12–14 April 2024; pp. 646–652. [Google Scholar]
Dong, W.; Yu, J.; Lin, X.; Gou, G.; Xiong, G. Deep learning and pre-training technology for encrypted traffic classification: A comprehensive review. Neurocomputing 2025, 617, 128444. [Google Scholar] [CrossRef]
Miskovic, S.; Lee, G.; Liao, Y.; Baldi, M. Appprint: Automatic fingerprinting of mobile applications in network traffic. In Proceedings of the International Conference on Passive and Active Network Measurement, New York, NY, USA, 19–20 March 2015; pp. 57–69. [Google Scholar]
Taylor, V.; Spolaor, R.; Conti, M.; Martinovic, I. Appscanner: Automatic fingerprinting of smartphone apps from encrypted network traffic. In Proceedings of the IEEE European Symposium on Security and Privacy, Saarbruecken, Germany, 21–24 March 2016; pp. 439–454. [Google Scholar]
Taylor, V.; Spolaor, R.; Conti, M.; Martinovic, I. Robust smartphone app identification via encrypted network traffic analysis. IEEE Trans. Inf. Forensics Secur. 2018, 13, 63–78. [Google Scholar] [CrossRef]
Aceto, G.; Ciuonzo, D.; Montieri, A.; Pescapé, A. Multi-classification approaches for classifying mobile app traffic. J. Netw. Comput. Appl. 2018, 103, 131–145. [Google Scholar] [CrossRef]
Zhai, L.; Qiao, Z.; Wang, Z.; Wei, D. Identify what you are doing: Smartphone apps fingerprinting on cellular network traffic. In Proceedings of the IEEE Symposium on Computers and Communications, Athens, Greece, 5–8 September 2021; pp. 1–7. [Google Scholar]
Rezaei, S.; Kroencke, B.; Liu, X. Large-scale mobile app identification using deep learning. IEEE Access 2019, 8, 348–362. [Google Scholar] [CrossRef]
Aceto, G.; Ciuonzo, D.; Montieri, A.; Pescapè, A. MIMETIC: Mobile encrypted traffic classification using multimodal deep learning. Comput. Netw. 2019, 165, 106944. [Google Scholar] [CrossRef]
Wang, X.; Chen, S.; Su, J. App-net: A hybrid neural network for encrypted mobile traffic classification. In Proceedings of the IEEE Conference on Computer Communications Workshops, Toronto, ON, Canada, 6–9 July 2020; pp. 424–429. [Google Scholar]
Wang, X.; Chen, S.; Su, J. Automatic mobile app identification from encrypted traffic with hybrid neural networks. IEEE Access 2020, 8, 182065–182077. [Google Scholar] [CrossRef]
Wang, X.; Chen, S.; Su, J. Real network traffic collection and deep learning for mobile app identification. Wirel. Commun. Mob. Comput. 2020, 2020, 4707909. [Google Scholar] [CrossRef]
Aceto, G.; Ciuonzo, D.; Montieri, A.; Pescapé, A. Toward effective mobile encrypted traffic classification through deep learning. Neurocomputing 2020, 409, 306–315. [Google Scholar] [CrossRef]
Shapira, T.; Shavitt, Y. FlowPic: A generic representation for encrypted traffic classification and applications identification. IEEE Trans. Netw. Serv. Manag. 2021, 18, 1218–1232. [Google Scholar] [CrossRef]
Lin, X.; Xiong, G.; Gou, G.; Li, Z.; Shi, J.; Yu, J. Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification. In Proceedings of the ACM Web Conference, Lyon, France, 25–29 April 2022; pp. 633–642. [Google Scholar]
Shen, M.; Zhang, J.; Zhu, L.; Xu, K.; Du, X. Accurate decentralized application identification via encrypted traffic analysis using graph neural networks. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2367–2380. [Google Scholar] [CrossRef]
Pham, T.; Ho, T.; Truong, T.; Cao, T.D.; Truong, H.L. Mappgraph: Mobile-app classification on encrypted network traffic using deep graph convolution neural networks. In Proceedings of the 37th Annual Computer Security Applications Conference, Virtual, 6–10 December 2021; pp. 1025–1038. [Google Scholar]
Jiang, M.; Li, Z.; Fu, P.; Cai, W.; Cui, M.; Xiong, G.; Gou, G. Accurate mobile-app fingerprinting using flow-level relationship with graph neural networks. Comput. Netw. 2022, 217, 109309. [Google Scholar] [CrossRef]
Huoh, T.; Luo, Y.; Li, P.; Zhang, T. Flow-based encrypted network traffic classification with graph neural networks. IEEE Trans. Netw. Serv. Manag. 2022, 20, 1224–1237. [Google Scholar] [CrossRef]
Li, S.; Huang, Y.; Gao, T.; Yang, L.; Chen, Y.; Pan, Q.; Zang, T.; Chen, F. FusionTC: Encrypted app traffic classification using decision-level multimodal fusion learning of flow sequence. Wirel. Commun. Mob. Comput. 2023, 2023, 9118153. [Google Scholar] [CrossRef]
Marzani, F.; Ghassemi, F.; Sabahi, Z.; Van, T.; Van, M. Mobile app fingerprinting through automata learning and machine learning. In Proceedings of the 2023 IFIP Networking Conference, Barcelona, Spain, 12–15 June 2023; pp. 1–9. [Google Scholar]
Xu, H.; Li, S.; Cheng, Z.; Qin, R.; Xie, J.; Sun, P. Trafficgcn: Mobile application encrypted traffic classification based on gcn. In Proceedings of the IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 891–896. [Google Scholar]
Jiang, M.; Cui, M.; Liu, C.; Gou, G.; Xiong, G.; Li, Z. Zero-relabelling mobile-app identification over drifted encrypted network traffic. Comput. Netw. 2023, 228, 109728. [Google Scholar] [CrossRef]
Bovenzi, G.; Monda, D.; Montieri, A.; Persico, V.; Pescapé, A. META MIMETIC: Few-shot classification of mobile-app encrypted traffic via multimodal meta-learning. In Proceedings of the 35th International Teletraffic Congress, Turin, Italy, 3–5 October 2023; pp. 1–9. [Google Scholar]
Chen, Y.; Tong, Y.; Hwee, G.; Cao, Q.; Razul, S.; Lin, Z. Encrypted mobile traffic classification with a few-shot incremental learning approach. In Proceedings of the IEEE 18th Conference on Industrial Electronics and Applications, Ningbo, China, 18–22 August 2023; pp. 40–45. [Google Scholar]
Zhao, S.; Chen, S.; Sun, Y.; Cai, Z.; Su, J. Identifying known and unknown mobile application traffic using a multilevel classifier. Secur. Commun. Netw. 2019, 2019, 9595081. [Google Scholar] [CrossRef]
Li, J.; Zhou, H.; Wu, S.; Luo, X.; Wang, T.; Zhan, X.; Ma, X. FOAP: Fine-grained open-world android app fingerprinting. In Proceedings of the 31st USENIX Security Symposium, Boston, MA, USA, 10–12 August 2022; pp. 1579–1596. [Google Scholar]
Li, J.; Wu, S.; Zhou, H.; Luo, X.; Wang, T.; Liu, Y.; Ma, X. Packet-level open-world app fingerprinting on wireless traffic. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 24–28 April 2022. [Google Scholar]
Wang, Y.; Xu, H.; Guo, Z.; Qin, Z.; Ren, K. SnWF: Website fingerprinting attack by ensembling the snapshot of deep learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1214–1226. [Google Scholar] [CrossRef]
Wang, X.; Wang, Y.; Lai, Y.; Hao, Z.; Liu, A. Reliable open-set network traffic classification. IEEE Trans. Inf. Forensics Secur. 2025, 20, 2313–2328. [Google Scholar] [CrossRef]
Alamo, J.; Guaman, D.; Balmori, B.; Diez, A. Privacy assessment in android apps: A systematic mapping study. Electronics 2021, 10, 1999. [Google Scholar] [CrossRef]
Sabahi, Z.; Ghassemi, F. An encrypted traffic classifier via combination of deep learning and automata learning. Soft Comput. 2024, 28, 13443–13460. [Google Scholar] [CrossRef]
Zhou, G.; Guo, X.; Liu, Z.; Li, T.; Li, Q.; Xu, K. Trafficformer: An efficient pre-trained model for traffic data. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 12–15 May 2025; pp. 1844–1860. [Google Scholar]
Shen, M.; Wu, J.; Ye, K.; Xu, K.; Xiong, G.; Zhu, L. Robust detection of malicious encrypted traffic via contrastive learning. IEEE Trans. Inf. Forensics Secur. 2025, 20, 4228–4242. [Google Scholar] [CrossRef]

Figure 1. The threat model of network traffic classification.

Figure 2. The overview of KZGNN.

Figure 3. Classification results under the unknown application-aware setting.

Figure 4. Classification results under the unknown application type-aware scenario.

Figure 5. Classification results under the unknown application label-aware scenario.

Figure 6. Classification performance among five application categories: (a) Precision; (b) Recall; (c) Accuracy; (d) F1-score.

Figure 7. Parameter sensitivity analysis with different values of

λ

in the multi-objective loss.

Figure 7. Parameter sensitivity analysis with different values of

λ

in the multi-objective loss.

Table 1. Representative mobile application examples for each category in the dataset.

Category	Representative Mobile Application Names (Available Online: https://play.google.com/store/games, Accessed on 12 December 2025)
Social	Facebook, TikTok, X (Twitter), Reddit, Instagram
Communication	WhatsApp, Telegram, Discord, Skype, Zoom
Games	Fortnite, EA SPORTS FC Mobile, PUBG Mobile, LifeAfter, Age of Empires Mobile
Tools	Google Chrome, Google Maps, Microsoft Authenticator, Google Drive, ES File Explorer
Productivity	Microsoft Outlook, Notion, Trello, Slack, Google Docs

Table 2. Wilcoxon statistical analysis on F1-score.

Comparison	$Mean ∆ F 1$	95% CI	Wilcoxon p-Value
KZGNN vs. FG-Net	+3.50%	[2.02, 4.98]	0.031
KZGNN vs. SmartDetector	+3.24%	[2.46, 4.02]	0.031
KZGNN vs. TrafficFormer	+4.82%	[2.35, 7.29]	0.031

Table 3. Temporal drift analysis among KZGNN and nine baselines using recollected traffic from 25 applications.

Method	DeNeTLang	AppScanner	FlowPrint	ET-BERT	App-Net	TrafficFormer	FG-Net	SmartDetector	Attribute-ZSL	KZGNN
Accuracy (%)	24.0	32.0	48.0	48.0	20.0	48.0	28.0	28.0	32.0	68.0

Table 4. Ablation study results on specific unknown application identification experiment.

	$P r e .$ (%)	$R e c .$ (%)	$A c c .$ (%)	$F 1$ (%)
KZGNN w/o KG/ZSL	77.5 ± 0.4	77.0 ± 0.4	78.5 ± 0.3	77.2 ± 0.3
KZGNN w/o KG	80.0 ± 0.3	79.5 ± 0.3	81.0 ± 0.2	79.7 ± 0.2
KZGNN w/o ZSL	81.7 ± 0.3	81.2 ± 0.3	82.8 ± 0.2	81.4 ± 0.1
KZGNN	84.0 ± 0.2	83.5 ± 0.2	85.2 ± 0.1	83.7 ± 0.1

Table 5. Ablation study on pretrained language model selection.

	$P r e .$ (%)	$R e c .$ (%)	$A c c .$ (%)	$F 1$ (%)
KZGNN w/TF-IDF	71.3 ± 0.3	67.1 ± 0.4	68.6 ± 0.4	64.6 ± 0.4
KZGNN w/BERT	79.3 ± 0.1	81.0 ± 0.0	80.9 ± 0.1	80.2 ± 0.0
KZGNN w/RoBERTa	76.3 ± 0.2	81.8 ± 0.1	80.7 ± 0.1	78.8 ± 0.1
KZGNN w/Sentence-BERT	80.1 ± 0.1	82.5 ± 0.1	84.7 ± 0.1	80.2 ± 0.1
KZGNN	84.0 ± 0.2	83.5 ± 0.2	85.2 ± 0.1	83.7 ± 0.1

Table 6. Computing efficiency analysis among KZGNN and nine baselines.

Method	DeNeTLang	AppScanner	FlowPrint	ET-BERT	App-Net	TrafficFormer	FG-Net	SmartDetector	Attribute-ZSL	KZGNN
ATC (ms)	28.6	31.6	47.7	974.6	54.8	983.7	55.6	132.6	285.8	207.7
AIC (ms)	3.5	5.1	2.7	43.5	5.5	44.0	7.1	19.5	24.8	28.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, D.; Huang, J.; Tian, M.; Guan, L. Knowledge-Enhanced Zero-Shot Graph Learning-Based Mobile Application Identification. Electronics 2026, 15, 126. https://doi.org/10.3390/electronics15010126

AMA Style

Zhang D, Huang J, Tian M, Guan L. Knowledge-Enhanced Zero-Shot Graph Learning-Based Mobile Application Identification. Electronics. 2026; 15(1):126. https://doi.org/10.3390/electronics15010126

Chicago/Turabian Style

Zhang, Dongfang, Jianan Huang, Manjun Tian, and Lei Guan. 2026. "Knowledge-Enhanced Zero-Shot Graph Learning-Based Mobile Application Identification" Electronics 15, no. 1: 126. https://doi.org/10.3390/electronics15010126

APA Style

Zhang, D., Huang, J., Tian, M., & Guan, L. (2026). Knowledge-Enhanced Zero-Shot Graph Learning-Based Mobile Application Identification. Electronics, 15(1), 126. https://doi.org/10.3390/electronics15010126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Knowledge-Enhanced Zero-Shot Graph Learning-Based Mobile Application Identification

Abstract

1. Introduction

2. Related Work

2.1. Known Application Identification

2.2. Unknown Application Identification

3. Preliminary

3.1. Threat Model

3.2. Basic Element of the Knowledge Graph

4. Proposed Method

4.1. Overview

4.2. Mobile Application Knowledge Graph Construction

4.3. Cross-View Knowledge Graph Fusion

4.4. Zero-Shot Graph Neural Network Design

5. Experiment Results and Analysis

5.1. Experimental Setup

5.1.1. Dataset

5.1.2. Baselines for Performance Evaluation

5.1.3. Performance Metrics

5.2. Unknown Application-Aware Classification Experiments

5.3. Unknown Application Type-Aware Classification Experiments

5.4. Unknown Application Label-Aware Classification Experiments

5.5. Temporal Drift Experiments

5.6. Ablation Experiments

5.7. Computing Efficiency Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI