Balancing Accuracy and Efficiency in Vehicular Network Firmware Vulnerability Detection: A Fuzzy Matching Framework with Standardized Data Serialization
Abstract
1. Introduction
- We design a code similarity fuzzy matching analysis algorithm based on statistical inference for resource-constrained vehicular network environments, achieving efficient security vulnerability identification under limited computational resources to resolve the application difficulties of traditional methods in resource-constrained environments.
- We develop a semantic similarity analysis framework extending from instruction level to procedural level, implementing scientific mapping from binary file similarity to execution semantic similarity through dynamic programming algorithms for the longest common subsequence in combination with the path search and neighborhood search techniques. This addresses the insufficient precision of existing methods in semantic information extraction.
- We construct a data transmission serialization format specification between detection models. The proposed specification uniformly defines instruction representation formats for multiple processor architectures, including ARM/x86/MIPS. In addition, we develop a dedicated instruction conversion preprocessor to resolve cross-platform data processing consistency issues and improve the detection model’s operational efficiency.
- By combining model detection experiments with vehicular network simulation experiments based on NS-3 and integrating the IEEE 802.11p protocol stack with a cascaded trust center architecture, we establish a complete security verification system from certificate chain management to V2V communication encryption, thereby filling the gap in existing research regarding systematic empirical verification.
2. Background and Literature Review
3. Proposed Method
3.1. Fuzzy Matching Analysis for Code Similarity
3.1.1. Function Identification
3.1.2. Function Similarity with Fuzzy Matching
3.1.3. Semantic Similarity Calculation
3.2. Training Information Formatted Using ProtoBuf
syntax = "proto3"; message ModelInput { repeated float instruction _features = 1 [packed=true]; int32 sequence_length = 2; int32 feature_dimension = 3; } message ModelOutput { repeated float embedded_ representation = 1 [packed=true]; repeated float attention _scores = 2 [packed=true]; float loss = 3; }
3.2.1. Introduction to ProtoBuf
3.2.2. ProtoBuf Instruction Representation Framework
syntax = "proto3"; package instruction; // Standardized representation // of an instruction message Instruction { // Architecture of the // instruction, // e.g., "x86" or "ARM" string architecture = 1; // List of operands repeated string operands = 2; // Opcode string opcode = 3; // Positions of the instruction // within the basic block repeated int32 positions = 4; }
3.2.3. Encoder
3.2.4. Decoder
3.2.5. Training and Prediction
3.3. ProtoBuf Structure-Based Pseudonym Certificate Verification Method for Vehicular Networks
3.3.1. Initialization Phase
3.3.2. Pseudonym Certificate Generation Phase
3.3.3. Vehicle-to-Vehicle Secure Communication Phase
3.4. Certificate Authentication Communication Data Structure Based on ProtoBuf
3.4.1. Key Negotiation Process Design
3.4.2. Implementation of Certificates Based on PB Structure
4. Experimental Results and Analysis
4.1. Experiment Setup
4.2. Performance Evaluation of Model on Downstream Tasks
4.2.1. Cross-Architecture Instruction Embedding for Firmware Vulnerability Detection Using ProtoBuf Bridging Technology
4.2.2. Instruction Search Evaluation
4.2.3. Similarity Analysis of Instruction Embedding Vectors
- (1)
- Despite Gemini’s impressive performance in the original literature, its generalization effectiveness is significantly insufficient in a novel data environment.
- (2)
- Artificially designed embedding methods (including Instruction2Vec and one-hot vectors) perform poorly, indicating that manual features may excessively depend on specific application scenarios.
- (3)
- FirmPB maintains excellent performance even when faced with test sets drastically different from training data, surpassing other solutions. This demonstrates that FirmPB can significantly enhance the general adaptability of downstream tasks.
- (4)
- All three pretraining tasks contribute to FirmPB’s final effectiveness. Notably, the complete FirmPB exhibits distinct advantages compared to its simplified version FirmPB-W/o-G, which experiences a decrease in performance to a level similar to Gemini (as shown in Figure 5B). This indicates that FirmPB achieves superior instruction embedding effects compared to traditional methods through its unique assembly instruction relationship generation algorithm.
4.3. Communication Cost of Vehicle-Side and RSU Trusted Communication Scheme
4.4. Security Evaluation of Trusted Communication Scheme
- (1)
- In the simulation, eavesdropping was tested by intercepting communication between two vehicle nodes to determine whether the security scheme avoids such threats. Simulations in NS3 showed that because communications between vehicles are encrypted regionally and messages are encrypted with the recipient’s public key when sent, only the private key can decrypt them; thus even if intercepted, the encrypted information cannot be directly read. Moreover, key negotiation in this scheme uses blockchain to verify public keys instead of certificate validation, avoiding centralized data tampering issues while skipping certificate verification steps.
- (2)
- Impersonation is tested by falsifying the identity of a vehicle node during communications to determine whether the scheme prevents such threats. In this scheme, vehicles first obtain certified public keys corresponding to their real identities from the TA in CS. The blockchain-verified public keys are then used for communications. Prior to sending, the sender encrypts a random number (agreed symmetric key) with its private key. When an attacker impersonates a vehicle or RSU during ongoing communications, the receiver performs decryption (signature verification) using the stored public key and detect fakes via comparison against the blockchain. In addition, falsified messages cannot be decrypted using the receiver’s public key.
- (3)
- Tampering is tested by intercepting and altering messages. In this scheme, regional encryption, signed messages, and authenticated entities prevent such threats. In simulations, malicious third parties cannot obtain plaintexts to tamper with due to lacking the regional symmetric key after intercepting encrypted messages.
- (4)
- Repudiation is tested by denying sent messages in order to verify the scheme’s capabilities. In this scheme, vehicle messages are digitally signed with the vehicle’s private key and pseudonym certificates are logged with the administration entity, allowing a lookup table to be queried in order to prove the sender and prevent repudiation.
4.5. Scalability Comparison of Firmware Analysis Methods
4.6. Discussion on Balancing Accuracy and Efficiency
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhao, J.; Wang, R. Fedmix: A sybil attack detection system considering cross-layer information fusion and privacy protection. In Proceedings of the 2022 19th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), Virtual Conference, 20–23 September 2022; pp. 199–207. [Google Scholar]
- Chen, R.; Younas, W.; Zhao, J. Firm-vehicle: Trusted communication enabled instruction embedding model for resource-constrained vanet environments. In Proceedings of the 20th International Conference, ICIC 2024, Tianjin, China, 5–8 August 2024; pp. 391–402. [Google Scholar]
- Ferguson, J. Reverse Engineering Code with IDA Pro; Syngress: Rockland, MA, USA, 2008. [Google Scholar]
- Feng, X.; Zhu, X.; Han, Q.-L.; Zhou, W.; Wen, S.; Xiang, Y. Detecting vulnerability on iot device firmware: A survey. IEEE/CAA J. Autom. Sin. 2022, 10, 25–41. [Google Scholar]
- Chen, H.; Liu, J.; Wang, J.; Xun, Y. Towards secure intra-vehicle communications in 5g advanced and beyond: Vulnerabilities, attacks and countermeasures. Veh. Commun. 2023, 39, 100548. [Google Scholar]
- Kim, G.; Hong, S.; Franz, M.; Song, D. Improving cross-platform binary analysis using representation learning via graph alignment. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Conference, 18–22 July 2022; pp. 151–163. [Google Scholar]
- Li, X.; Qu, Y.; Yin, H. Palmtree: Learning an assembly language model for instruction embedding. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Conference, 15–19 November 2021; pp. 3236–3251. [Google Scholar]
- Zhang, X.; Sun, W.; Pang, J.; Liu, F.; Ma, Z. Similarity metric method for binary basic blocks of cross-instruction set architecture. In Proceedings of the 2020 Workshop on Binary Analysis Research, San Diego, CA, USA, 23 February 2020; Volume 10. [Google Scholar]
- Park, J.; Lee, S.; Hong, J.; Ryu, S. Static analysis of jni programs via binary decompilation. IEEE Trans. Softw. Eng. 2023, 49, 3089–3105. [Google Scholar]
- Song, Q.; Zhang, Y.; Wang, B.; Chen, Y. Inter-bin: Interaction-based cross-architecture iot binary similarity comparison. IEEE Internet Things J. 2022, 9, 20018–20033. [Google Scholar]
- Riley, G.F.; Henderson, T.R. The ns-3 network simulator. In Modeling and Tools for Network Simulation; Springer: Berlin/Heidelberg, Germany, 2010; pp. 15–34. [Google Scholar]
- Xu, X.; Liu, C.; Feng, Q.; Yin, H.; Song, L.; Song, D. Neural network-based graph embedding for cross-platform binary code similarity detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 363–376. [Google Scholar]
- Tang, R.; Lu, Y.; Liu, L.; Mou, L.; Vechtomova, O.; Lin, J. Distilling task-specific knowledge from bert into simple neural networks. arXiv 2019, arXiv:1903.12136. [Google Scholar]
- Lin, J.; Liu, Z.; Wang, H.; Han, S. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 784–800. [Google Scholar]
- Park, S.; Choi, W. Regulated subspace projection based local model update compression for communication-efficient federated learning. IEEE J. Sel. Areas Commun. 2023, 41, 964–976. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Duan, Y.; Li, X.; Wang, J.; Yin, H. Deepbindiff: Learning program-wide code representations for binary diffing. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 23–26 February 2020. [Google Scholar]
- Ding, S.H.; Fung, B.C.; Charland, P. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–22 May 2019; pp. 472–489. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 26. [Google Scholar]
- Chua, Z.L.; Shen, S.; Saxena, P.; Liang, Z. Neural nets can learn function type signatures from binaries. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 99–116. [Google Scholar]
- Zuo, F.; Li, X.; Young, P.; Luo, L.; Zeng, Q.; Zhang, Z. Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv 2018, arXiv:1808.04706. [Google Scholar]
- Chen, S.; Hu, J.; Shi, Y.; Peng, Y.; Fang, J.; Zhao, R.; Zhao, L. Vehicle-to-everything (v2x) services supported by lte-based systems and 5g. IEEE Commun. Stand. Mag. 2017, 1, 70–76. [Google Scholar]
- Sun, H.; Sun, M.; Weng, J.; Liu, Z. Analysis of id sequences similarity using dtw in intrusion detection for can bus. IEEE Trans. Vehicular Technol. 2022, 71, 10426–10441. [Google Scholar]
- Molina-Masegosa, R.; Gozalvez, J.; Sepulcre, M. Configuration of the c-v2x mode 4 sidelink pc5 interface for vehicular communication. In Proceedings of the 2018 14th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), Shenyang, China, 6–8 December 2018; pp. 43–48. [Google Scholar]
- Tan, H.; Zheng, W.; Vijayakumar, P.; Sakurai, K.; Kumar, N. An efficient vehicle-assisted aggregate authentication scheme for infrastructure-less vehicular networks. IEEE Trans. Intell. Transp. 2022, 24, 15590–15600. [Google Scholar]
- Yang, Y.; Wei, L.; Wu, J.; Long, C.; Li, B. A blockchain-based multidomain authentication scheme for conditional privacy preserving in vehicular ad-hoc network. IEEE Internet Things J. 2021, 9, 8078–8090. [Google Scholar]
- Adil, M.; Ali, J.; Attique, M.; Jadoon, M.M.; Abbas, S.; Alotaibi, S.R.; Menon, V.G.; Farouk, A. Three byte-based mutual authentication scheme for autonomous internet of vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 23, 9358–9369. [Google Scholar]
- Zhou, Y.; Cao, L.; Qiao, Z.; Xia, Z.; Yang, B.; Zhang, M.; Zhang, W. An efficient identity authentication scheme with dynamic anonymity for vanets. IEEE Internet Things J. 2023, 10, 10052–10065. [Google Scholar]
- Blyth, D.; Alcaraz, J.; Binet, S.; Chekanov, S.V. ProIO: An event-based I/O stream format for protobuf messages. Comput. Phys. Commun. 2019, 241, 98–112. [Google Scholar]
- IEEE Std 1609.0-2019 (Revision of IEEE Std 1609.0-2013); IEEE Guide for Wireless Access in Vehicular Environments (WAVE) Architecture. IEEE: Piscataway, NJ, USA, 2019; pp. 1–106.
- Guo, W.; Mu, D.; Xing, X.; Du, M.; Song, D. {DEEPVSA}: Facilitating value-set analysis with deep learning for postmortem program analysis. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; pp. 1787–1804. [Google Scholar]
- Lattner, C.; Adve, V. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization, 2004. CGO 2004, San Jose, CA, USA, 20–24 March 2004; pp. 75–88. [Google Scholar]
- Lee, Y.J.; Choi, S.-H.; Kim, C.; Lim, S.-H.; Park, K.-W. Learning binary code with deep learning to detect software weakness. In Proceedings of the KSII the 9th International Conference on Internet (ICONI) 2017 Symposium, Vientiane, Laos, 17–20 December 2017. [Google Scholar]
- 3GPP TR22. 886 V16. 2.0. Study on Enhancement of 3GPP Support for 5G V2X Services. 2018. Available online: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3108 (accessed on 5 July 2025).
- Chang, B.; Zhao, B.; Zhang, Q.; Liu, P.; Tian, Y.; Beyah, R.; Ji, S. FirmRCA: Towards Post-Fuzzing Analysis on ARM Embedded Firmware with Efficient Event-based Fault Localization. In Proceedings of the 2025 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 12–14 May 2025; pp. 3783–3800. [Google Scholar]
- Xu, J.; Mu, D.; Xing, X.; Liu, P.; Chen, P.; Mao, B. Postmortem program analysis with hardware-enhanced post-crash artifacts. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 17–32. [Google Scholar]
- Mu, D.; Du, Y.; Xu, J.; Xu, J.; Xing, X.; Mao, B.; Liu, P. POMP++: Facilitating postmortem program diagnosis with value-set analysis. IEEE Trans. Softw. Eng. 2019, 47, 1929–1942. [Google Scholar]
- Intel. Collecting Intel® Processor Trace (Intel Pt) in Intel® System Debugger. Available online: https://www.intel.com/content/www/us/en/developer/videos/collecting-processor-trace-in-intel-system-debugger.html?wapkw=intel%20pt (accessed on 5 July 2025).
Notations or Abbreviations | Description |
---|---|
XML | Extensible Markup Language |
ProtoBuf/PB | Google Protocol Buffers |
IoV | Internet of Vehicles |
CAN | controller area network |
V2V | Vehicle-to-Vehicle |
ECU | electronic control unit |
OBU | On-board Unit |
LCS | Longest Common Subsequence |
BFS | Breadth-First Search |
CFG | Control Flow Graph |
FMENS | Fuzzy Matching Enhanced Neighborhood Search |
CRL | Certificate Revocation Lists |
CA | Certificate Authorities |
MTA | Main Trust Authority |
STA | Trust Authorization Center |
ERI | Electronic Registration Identification |
M | the predetermined number of pseudonym certificates |
OOV | Out-of-Vocabulary |
Times | Initialization Scheme (s) | Generate Pk/Sk (s) | Communications By Pk (s) | Instruction Embedding (s) | Total Delay (s) |
---|---|---|---|---|---|
1 | 0.000301 | 0.006618 | 0.000425 | 0.041301 | 0.048645 |
2 | 0.000271 | 0.003636 | 0.000586 | 0.052277 | 0.05677 |
3 | 0.000266 | 0.004524 | 0.000625 | 0.0460287 | 0.0514437 |
4 | 0.000628 | 0.004308 | 0.00062 | 0.0410363 | 0.0465923 |
5 | 0.000291 | 0.005184 | 0.000441 | 0.0210282 | 0.0269442 |
6 | 0.000301 | 0.005001 | 0.000402 | 0.0390221 | 0.0447261 |
7 | 0.000297 | 0.004869 | 0.000221 | 0.0380332 | 0.0434202 |
8 | 0.000311 | 0.004422 | 0.000441 | 0.040233 | 0.045407 |
9 | 0.000456 | 0.004367 | 0.000543 | 0.042299 | 0.047665 |
10 | 0.000324 | 0.003635 | 0.00045 | 0.033361 | 0.03777 |
avg. | 0.0003446 | 0.0046564 | 0.0004754 | 0.03946195 | 0.04493835 |
Scenario | Effective Range/(m) | Absolute Speed/(km/h) | Relative Velocity/(km/h) | Maximum Latency/(ms) | Receiving Reliability |
---|---|---|---|---|---|
suburb | 200 | 50 | 100 | 100 | 90% |
highway1 | 320 | 160 | 280 | 100 | 90% |
highway2 | 320 | 280 | 280 | 100 | 80% |
NLOS/City | 100 | 50 | 100 | 100 | 90% |
Urban intersection | 50 | 50 | 100 | 100 | 95% |
Campus/business district | 50 | 30 | 30 | 100 | 90% |
Emergency collision | 20 | 80 | 160 | 20 | 95% |
Model | 800 | 1600 | 3200 | 6400 | 12,800 |
---|---|---|---|---|---|
FirmPB | 35.44 s | 76.26 s | 150.79 s | 300.34 s | 630.26 s |
FirmRCA [35] | 43.78 s | 122.70 s | 348.12 s | 2.1 h | 4.7 h |
POMP [36] | 53.78 s | 138.45 s | 331.62 s | 1.8 h | 5.7 h |
POMP++ [37] | 63.65 s | 141.32 s | 1.1 h | 3.1 h | 5.7 h |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fang, X.; He, K.; Wu, Y.; Chen, R.; Zhao, J. Balancing Accuracy and Efficiency in Vehicular Network Firmware Vulnerability Detection: A Fuzzy Matching Framework with Standardized Data Serialization. Informatics 2025, 12, 67. https://doi.org/10.3390/informatics12030067
Fang X, He K, Wu Y, Chen R, Zhao J. Balancing Accuracy and Efficiency in Vehicular Network Firmware Vulnerability Detection: A Fuzzy Matching Framework with Standardized Data Serialization. Informatics. 2025; 12(3):67. https://doi.org/10.3390/informatics12030067
Chicago/Turabian StyleFang, Xiyu, Kexun He, Yue Wu, Rui Chen, and Jing Zhao. 2025. "Balancing Accuracy and Efficiency in Vehicular Network Firmware Vulnerability Detection: A Fuzzy Matching Framework with Standardized Data Serialization" Informatics 12, no. 3: 67. https://doi.org/10.3390/informatics12030067
APA StyleFang, X., He, K., Wu, Y., Chen, R., & Zhao, J. (2025). Balancing Accuracy and Efficiency in Vehicular Network Firmware Vulnerability Detection: A Fuzzy Matching Framework with Standardized Data Serialization. Informatics, 12(3), 67. https://doi.org/10.3390/informatics12030067