Abstract
With the growth of network scale and the sophistication of cyberattacks, traditional learning-based traffic analysis methods struggle to maintain generalization. While Large Language Model (LLM)-based approaches offer improved generalization, they suffer from low training and inference efficiency on consumer-grade GPU platforms—typical in resource-constrained deployment scenarios. As a result, existing LLM-based methods often rely on small-parameter models, which limit their effectiveness. To overcome these limitations, we propose to use a large-parameter LLM-based algorithm for network traffic analysis that enhances both generalization and performance. We further introduce two key techniques to enable practical deployment and improve efficiency on consumer-grade GPUs: (a) a traffic-to-text mapping strategy that allows LLMs to process raw network traffic, coupled with a LoRA-based fine-tuning mechanism to improve adaptability across downstream tasks while reducing training overhead; and (b) a sparsity-aware inference acceleration mechanism that employs a hot–cold neuron allocation strategy to alleviate hardware bottlenecks and predicts inactive neurons to skip redundant computations. Experimental results on a consumer-grade NVIDIA RTX A6000 GPU show that our method outperforms existing LLM-based approaches by 6–8% in accuracy across various network traffic analysis tasks, benefiting from the adoption of large-parameter models. Furthermore, our approach achieves up to a 4.07× improvement in inference efficiency compared with llama.cpp, demonstrating both the effectiveness and practicality of the proposed design for real-world network traffic analysis applications.