Intelligent Question-Answering System for New Energy Vehicles Integrating Deep Semantic Parsing and Knowledge Graphs
Abstract
1. Introduction
- (1)
- Robust Knowledge Extraction via Deep Semantic Parsing: To address terminology ambiguity and complex causal dependencies in noisy texts, this study introduces a fusion of a weighted semantic alignment strategy and a deep sequence labeling network (utilizing the BERT-BiLSTM-CRF architecture). This approach extracts core diagnostic logic from over 150,000 real-world maintenance records, establishing a high-precision factual foundation for the domain-specific KG.
- (2)
- Graph Topological Optimization and Intent Mapping: To overcome path proliferation, the Labeled Property Graph (LPG) specification is applied to encapsulate over 50 technical parameters across 2157 vehicle variants as intrinsic entity attributes. This structural design reduces the average degree of the entire KG to 1.75, effectively mitigating topological redundancy. Furthermore, an intent classification module (TextCNN) is utilized to automatically transform unstructured user queries into structured graph commands, bridging natural language with multi-hop retrieval across up to five semantic levels.
- (3)
- Empirical Evaluation and Architectural Deployment: The proposed methodological framework was rigorously evaluated using a finely annotated benchmark dataset of 3662 entries. Testing results demonstrate that the architecture achieves 98.0% accuracy in joint entity-relation extraction and 93.0% in intent classification, significantly outperforming baseline models. The resulting industrial-grade vertical KG, comprising 8274 nodes and 14,488 edges, offers a scalable and verifiable paradigm for data structuralization and automated diagnostic QA in complex engineering domains.
2. Research on the Construction of Knowledge Graphs for New Energy Vehicles
2.1. Theoretical and Logical Framework for Developing Vertical Knowledge Graphs in the New Energy Vehicle Domain
2.1.1. Fundamental Architecture and Intrinsic Value of Knowledge Graphs
2.1.2. Ontology Construction in the Field of New Energy Vehicles
2.1.3. Bidirectional Fusion-Based Graph Construction
2.2. Multi-Source Data Collection and Preprocessing
2.3. Entity Fusion and Knowledge Extraction
2.3.1. Entity Alignment: Disambiguation of Polysemous Entities Based on Weighted Fusion Strategy
2.3.2. Knowledge Extraction: BIO Annotation Is Implemented in Collaboration with BERT-BiLSTM-CRF
2.3.3. BERT-BiLSTM-CRF Model: Entity Recognition in the NEV Context
- (1)
- BERT Module for Polysemy and Contextual Encoding: NEV corpora abound with non-standard abbreviations and polysemous terms (e.g., “BMS” denoting either high-voltage Battery or low-voltage Body Management Systems), which evade static word vectors. Incorporating BERT’s multi-head attention [21,22], the module computes global contextual weights (e.g., capturing co-occurrences of “BMS” with “power battery” versus “window”) for dynamic semantic disambiguation.
- (2)
- BiLSTM Module for Colloquial Texts and Sequence Features: Real-world after-sales records are characteristically unstructured and noisy, frequently exhibiting causal inversions (e.g., “starter cranks normally, yet the engine fails to start and the fault light illuminates”). To elucidate how the model captures such long-range semantic dependencies, Figure 4 details the internal architecture of the BiLSTM memory cell. By leveraging precise gating mechanisms (forget, input, and output gates), the bidirectional network effectively mitigates vanishing gradients [23], enabling the robust extraction of complex dependencies—such as “Component (engine)–Symptom (fails to start)–Manifestation (light on)”—from highly colloquial corpora.
- (3)
- CRF Module for Boundary Constraints: Components and fault states frequently form compound expressions (e.g., “left front shock absorber oil leak”), causing BiLSTM boundary misclassifications. The CRF output layer enforces global sequence constraints by learning a label transition probability matrix (e.g., “B-Component” must be followed by “I-Component” or “O”). This corrects misaligned predictions (e.g., properly segmenting “[Component] rear bumper” from “[State] broken”), effectively minimizing boundary error rates in non-standard texts [22].
2.4. Knowledge Storage and Visualization Infrastructure
2.4.1. Neo4j Graph Database: Advanced Storage Solution for Multi-Source Knowledge Integration and Multi-Hop Reasoning
2.4.2. Implementation of Knowledge Storage Utilizing Neo4j
2.4.3. Storage Results and Structural Characteristics
3. Analytical Architecture and Engineering Implementation of the Intelligent QA System
3.1. Architectural Rationale and Core Methodological Choices
3.1.1. Intent Parsing: TextCNN over Sequential Models
3.1.2. Explainable Fault Tracing and Topological Verification
3.2. Question Parsing and Querying Framework Utilizing TextCNN
3.2.1. Character-Level Intent Parsing for Noisy Short Texts
3.2.2. Deterministic Semantic-to-Topological Mapping
3.3. Architectural Implementation and Multi-Hop Reasoning Mechanism
3.3.1. Four-Layer Decoupled Architecture Design
- (1)
- Base Infrastructure Layer: Provides foundational computing and storage via standard frameworks (e.g., Python, Neo4j) to sustain high-concurrency operations.
- (2)
- Knowledge Structuralization Layer: Applies entity alignment and relation extraction to multi-source NEV data, institutionalizing the domain KG.
- (3)
- Intent-Driven Reasoning Layer: Transcends traditional keyword matching by classifying intents via TextCNN and mapping them into strict Cypher commands, enabling deterministic multi-hop graph retrieval [30].
- (4)
- Topological Interpretability Layer: Built on Flask and D3.js, this layer transforms backend triplet data into dynamic topologies, aiming to overcome the “black-box” nature of AI diagnostics through visual verification.
3.3.2. Methodological Implementations: Multi-Hop Reasoning and Interpretability
- (1)
- Execution of Multi-Hop Graph Retrieval and Path Pruning
- (2)
- Result Encapsulation and Topological Interpretability
4. Experimental Verification and Result Analysis
4.1. Experimental Environment and Dataset Configuration
4.1.1. Hardware and Software Infrastructure
4.1.2. Dataset Statistics and Reproducibility Support
4.1.3. Hyperparameterization
4.2. Evaluation Metrics
4.3. Performance Evaluation of the Knowledge Extraction Pipeline
4.4. Robustness Validation of the Intent Recognition Module (TextCNN)
4.5. End-to-End Pipeline Testing and Evaluation
4.5.1. Quantitative Performance Assessment (Black-Box Testing)
4.5.2. Error Analysis of Multi-Hop Retrieval
4.5.3. System Stability and Integration Testing
5. Conclusions and Future Perspectives
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Takiso, T.A.; Yu, J. Research progress on the optimization of thermal management systems for lithium-ion batteries in new energy vehicles. J. Energy Storage 2025, 134, 118144. [Google Scholar] [CrossRef]
- Mohammadzadeh, N.; Zegordi, S.H.; Nikbakhsh, E.; Kashan, A.H. Optimal subsidy and pricing in the electric vehicle ecosystem: A case study on energy pricing policies. Sustain. Futures 2025, 10, 101298. [Google Scholar] [CrossRef]
- Bhatti, G.; Mohan, H.; Singh, R.R. Towards the future of smart electric vehicles: Digital twin technology. Renew. Sustain. Energy Rev. 2021, 141, 110801. [Google Scholar] [CrossRef]
- Li, S.X.; Liu, Y.Q.; Wang, J.Y.; Zhang, L. China’s new energy vehicle industry development policy: Based on the market performance. China Popul. Resour. Environ. 2016, 26, 158–166. [Google Scholar]
- Li, T.; Ma, L.; Liu, Z.; Yi, C.; Liang, K. Dual Carbon Goal-Based Quadrilateral Evolutionary Game: Study on the New Energy Vehicle Industry in China. Int. J. Environ. Res. Public Health 2023, 20, 3217. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.J.; Xu, B.; Hu, J.W. An accurate and efficient domain knowledge graph construction method. J. Softw. 2018, 29, 2931–2947. [Google Scholar]
- Meng, F.Q.; Yang, S.S.; Wang, J.D. Creating knowledge graph of electric power equipment faults based on BERT-BiLSTM-CRF model. J. Electr. Eng. Technol. 2022, 17, 2507–2516. [Google Scholar] [CrossRef]
- Qi, Y.; Mai, G.C.; Zhu, R.; Zhang, M. EVKG: An interlinked and interoperable electric vehicle knowledge graph for smart transportation system. Trans. GIS 2023, 27, 613–630. [Google Scholar] [CrossRef]
- Xie, C.T.; Deng, L.; Tang, Z.T.; He, J. Fusion and construction strategy of knowledge graphs from multi-source data. In Proceedings of the 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkuru, India, 4–5 December 2024; pp. 1–6. [Google Scholar]
- Su, C.; Hou, P.; Liu, F.; Yi, X. A review of knowledge graph-based research methods for fault diagnosis of special vehicles. In Proceedings of the 2024 IEEE International Conference, Hangzhou, China, 11–14 October 2024. [Google Scholar]
- Ojima, Y.; Sakaji, H.; Nakamura, T.; Sakata, H.; Seki, K.; Teshigawara, Y.; Yamashita, M.; Aoyama, K. Knowledge management for automobile failure analysis using graph RAG. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–18 December 2024; pp. 6624–6630. [Google Scholar]
- Ma, Z.G.; Ni, R.Y.; Yu, K.H. Recent advances, key techniques and future challenges of knowledge graph. Chin. J. Eng. 2020, 42, 1254–1266. [Google Scholar]
- Huang, H.Q.; Yu, J.; Liao, X.; Xi, Y.J. Review on knowledge graphs. Comput. Syst. Appl. 2019, 28, 1–12. (In Chinese). Available online: http://www.c-s-a.org.cn/1003-3254/6915.html (accessed on 21 April 2026).
- Perquku, A.; Minkovska, D.; Stoyanova, L. Modeling and processing big data of power transmission grid substation using Neo4j. Procedia Comput. Sci. 2017, 113, 9–16. [Google Scholar] [CrossRef]
- Liu, Q.; Li, Y.; Duan, H. Knowledge Graph Construction Technology Overview. J. Comput. Res. Dev. 2016, 53, 582–600. [Google Scholar]
- Wahyuningsih, T.; Henderi, H.; Winarno, W. Text mining an automatic short answer grading (ASAG): Comparison of three methods of cosine similarity, jaccard similarity and Dice’s coefficient. J. Appl. Data Sci. 2021, 2, 45–54. [Google Scholar] [CrossRef]
- Leewis, S. Improving Operational Decision-Making Through Decision Mining. Ph.D. Thesis, HU University of Applied Sciences Utrecht, Utrecht, The Netherlands, 2025. [Google Scholar]
- Reimers, N.; Gurevych, I. Optimal hyperparameters for deep LSTM-networks for sequence labeling tasks. arXiv 2017, arXiv:1707.06799. [Google Scholar] [CrossRef]
- Jurafsky, D.; Martin, J.H. RNNs and LSTMs. In Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd ed.; Stanford University: Stanford, CA, USA, 2024; Chapter 8. [Google Scholar]
- Shen, H.J.; Tian, C.J.; Chen, X.; Ou, J.X.; Hu, X.B.; Han, M. A study on a domain BERT-based named entity recognition method for faulty text. Data Inf. Comput. Sci. 2025, 67, 88–97. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
- Chen, S.Y.; Niu, L.Y.; Li, J.N. Structured Element Extraction from Official Documents Based on BERT-CRF and Knowledge Graph-Enhanced Retrieval. Mathematics 2025, 13, 2779. [Google Scholar] [CrossRef]
- Xu, G.X.; Meng, Y.T.; Qiu, X.Y.; Yu, Z.H.; Wu, X. Sentiment analysis of comment texts based on BiLSTM. IEEE Access 2019, 7, 51522–51532. [Google Scholar] [CrossRef]
- Neo4j Team. Neo4j Graph Database & Analytics: Graph Database Management System. Available online: https://neo4j.com/ (accessed on 26 December 2025).
- Yan, Y. ERNIE-TextCNN: Research on classification methods of Chinese news headlines in different situations. Sci. Rep. 2025, 15, 29071. [Google Scholar] [CrossRef]
- Jiang, X.; Song, C.; Xu, Y.; Li, Y.; Peng, Y. Research on sentiment classification for netizens based on the BERT-BiLSTM-TextCNN model. PeerJ Comput. Sci. 2022, 8, e1005. [Google Scholar] [CrossRef]
- Zhang, S.; Liu, K.; Xu, Y. TransCNN: A novel architecture combining transformer and TextCNN for detecting N4-acetylcytidine sites in human mRNA. Anal. Biochem. 2025, 703, 115882. [Google Scholar] [CrossRef]
- Ono, K.; Demchak, B.; Ideker, T. Cytoscape tools for the web age: D3.js and Cytoscape.js exporters. F1000Research 2014, 3, 143. [Google Scholar] [CrossRef]
- Tran, Q.B.H.; Waheed, A.A.; Chung, S.T. Robust Text-to-Cypher Using Combination of BERT, GraphSAGE, and Transformer (CoBGT) Model. Appl. Sci. 2024, 14, 7881. [Google Scholar] [CrossRef]
- Sun, X.; Liu, Z.; Huo, X. Six-Granularity Based Chinese Short Text Classification. IEEE Access 2023, 11, 35841–35852. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Lipton, Z.C.; Elkan, C.; Naryanaswamy, B. Optimal thresholding of classifiers to maximize F1 measure. In Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2014; pp. 225–239. [Google Scholar]















| Alignment Model | Core Mechanism & Weight Distribution | Accuracy | Failure Analysis |
|---|---|---|---|
| Baseline | Relies solely on the Dice coefficient (SDice) | 74.0% | Highly susceptible to misclassifying “engine” and “generator” as synonymous entities (orthographically similar but semantically distinct). |
| Variant | Relies solely on cosine similarity (Scos) | 83.5% | Low recall rate for extreme non-standard colloquial abbreviations such as “rear bumper/rear bar”. |
| Ours | S = 0.7 × SDice + 0.3 × Scos | 92.0% | Effectively circumvents orthographic interference and colloquial variations, significantly reducing alignment bias. |
| Entity Type | Object Type | Instances | ||||
|---|---|---|---|---|---|---|
| Component Unit | Various units, parts, and equipment in the manufacturing field | “Fuel pump”, “Separator” | ||||
| Performance | Characteristics or performance | “Pressure”, “Rotational speed”, | ||||
| Characterization | descriptions of components | “Temperature” | ||||
| Fault State | Descriptions of fault states of systems or components | “Oil leakage”, “Fracture”, “Stuck” | ||||
| Detection Tool | Specialized instruments for detecting certain faults | “Zero sequence transformer”, “Protector”, “Leakage current tester” | ||||
| (a) Entity Types | ||||||
| Subject Type | Object Type | RelationType | Subject Instance | Object Instance | ||
| Component Unit | Fault State | Component Failure | Engine Cover | Shaking | ||
| Performance Characteristic | Fault State | Performance Failure | Liquid Level | Lowering | ||
| Detection Tool | Performance Characteristic | Detection Tool | Detection Tool | Electric Current | ||
| Component Unit | Component Unit | Composition | Circuit Breaker | Converter Transformer | ||
| (b) Entity relationship types | ||||||
| Entity Label | Count | Proportion (%) | Technical Significance & Data Source Proof |
|---|---|---|---|
| CarModel | 5609 | 67.8% | Core hub of the KG, connecting 5499 HAS_UNIT relationships. |
| CarVariant | 2157 | 26.1% | Accurately proven by 2157 VARIANT_OF relationships, encapsulating 50+ parameters. |
| Manufacturer | 422 | 5.1% | 2157 PRODUCES edges converge into 422 manufacturer nodes. |
| Unit | 21 | 0.25% | The core supports 5499 component associations and 176 fault inferences. |
| Failure | 14 | 0.17% | Carries the normalized results of 176 UNIT_FAILURE relationships. |
| CarClass | 27 | 0.33% | Corresponds to the classification endpoints of 2157 BELONGS_TO_CLASS relationships. |
| EnergyType | 11 | 0.13% | Corresponds to the powertrain constraints of 2157 USES_ENERGY relationships. |
| Feature | 9 | 0.11% | Corresponds to the representation layer of 82 FEATURE_FAILURE relationships. |
| DetectionTool | 4 | 0.05% | Corresponds to the terminal recommendations of 12 DETECTS edges. |
| Total | 8274 | 100% | The total number of nodes perfectly matches the total number of relationships (14,488). |
| Hyperparameters | Value | Target Module & Configuration Description |
| Software Framework | TensorFlow 2.6.0 | Primary deep learning engine. |
| Graph Database | Neo4j 3.5.5 | Knowledge graph storage and retrieval. |
| Transformer Layers | 12 | Extraction of foundational global semantic representations |
| Kernel Sizes | (2, 3, 4) | Extraction of multi-scale intent features |
| Filters | 128 | Number of channels per convolutional kernel dimension |
| Dropout Rate | 0.5 | Prevents overfitting in the intent classification layer |
| Learning Rate | 3 × 10−5 | Prevents gradient oscillation in BERT pre-trained weights |
| BERT Fine-tuning BS | 32 | Optimized for VRAM constraints during the GPU-based fine-tuning phase. |
| TextCNN Batch Size | 256 | High-concurrency intent parsing during the CPU-based inference phase. |
| Max Epochs | 240 | Configured for full convergence; validated by performance curves in Figure 12, Figure 13 and Figure 14. |
| Fusion Coefficients | 0.7/0.3 | Optimized weights for morphological (SDice) and semantic (Scos) alignment. |
| Number | Category | Sample | Correct Results | Accuracy Rate |
|---|---|---|---|---|
| 1 | Introduction | Introduction of BYD Qin Plus | 49 | 98% |
| 2 | Specifications | Motor model of XPeng P7 | 50 | 100% |
| 3 | Policies | What are the preferential policies for plug-in hybrids? | 45 | 90% |
| 4 | Faults | Common faults of range extenders | 38 | 76% |
| Case ID | Real User Query | Error Category | Actual Performance & Failure Point | Root Cause Analysis |
|---|---|---|---|---|
| 1 | “Can I replace the gearbox of my new Tank 300 due to oil leakage and abnormal noise?” | Missing Complex Coordinate Intent | Only retrieved and returned the “oil leakage” node. | The single-hop query template decomposes and executes the dual “AND” logic. |
| 2 | “Why does the car ‘roar but will not move’ when I step on the gas?” | Slang/Jargon Entity Deviation | Failed to link to the target node (power slipping). | Lacked a synonym vector mapping mechanism to align industry slang with standard terminology. |
| 3 | “The front chassis keeps making a clicking noise.” | Spatial Granularity Mismatch | Unable to drill down and lock onto specific micro-components. | Macro-spatial pronouns (“chassis”) cannot be directly mapped downwards to specific micro-components (e.g., shock absorbers) at the bottom layer of the graph. |
| 4 | “How is the quality of the Hyundai Elantra? Is it worth buying?” | Intent Out-of-Domain (OOD) | Erroneously switched to the historical fault list of this model. | The graph focuses on after-sales diagnosis; the model lacks a rejection mechanism for out-of-domain intents like “purchasing guide”. |
| 5 | “What causes the clutch not to disengage, and how to handle it?” | Multi-hop Reasoning Breakage | Only returned the cause, missing the underlying diagnostic tools. | In local sparse subgraphs, explicit relationship edges between performance anomalies and diagnostic tools are missing (KG Sparsity). |
| 6 | “The 4S shop found nothing, but it costs hundreds every time.” | Implicit Complaint (Entity-less) | Entity extraction failed, triggering an execution error. | Emotional venting expressions lack the core technical entities required to trigger a graph query. |
| 7 | “I filed a complaint on 17 June. How long is the processing cycle?” | Missing Non-diagnostic Business Rules | Unable to parse the graph query path. | The underlying graph has not integrated dynamic customer service business rule data (e.g., work order routing). |
| 8 | “Are there any recommendations for NEV shock absorbers that do not leak oil?” | Negation Semantic Misjudgment | Reversed all shock absorbers with “oil leakage”. | Traditional entity extraction ignores the negation constraint, triggering reverse feature matching. |
| 9 | “The tire pressure is only 1.8 now. Will it trigger an alarm?” | Missing Continuous Numerical Reasoning | Failed to trigger the low-pressure alarm threshold in the graph. | Graph nodes are mostly discrete text, lacking the capability to calculate boundaries for continuous numerical values (e.g., IF value < 2.2). |
| 10 | “That round part is broken, will it cause oil leakage?” | Coreference Resolution Failure | Unable to lock onto a specific part entity. | Pure-text graph queries cannot combine visual shapes (“round”) for commonsense reasoning and pronoun restoration. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wu, Y.; Li, P.; Geng, T.; Wang, Y.; Zhang, H.; Li, S. Intelligent Question-Answering System for New Energy Vehicles Integrating Deep Semantic Parsing and Knowledge Graphs. Informatics 2026, 13, 66. https://doi.org/10.3390/informatics13050066
Wu Y, Li P, Geng T, Wang Y, Zhang H, Li S. Intelligent Question-Answering System for New Energy Vehicles Integrating Deep Semantic Parsing and Knowledge Graphs. Informatics. 2026; 13(5):66. https://doi.org/10.3390/informatics13050066
Chicago/Turabian StyleWu, Yaqi, Pengcheng Li, Tong Geng, Yi Wang, Haiyu Zhang, and Shixiong Li. 2026. "Intelligent Question-Answering System for New Energy Vehicles Integrating Deep Semantic Parsing and Knowledge Graphs" Informatics 13, no. 5: 66. https://doi.org/10.3390/informatics13050066
APA StyleWu, Y., Li, P., Geng, T., Wang, Y., Zhang, H., & Li, S. (2026). Intelligent Question-Answering System for New Energy Vehicles Integrating Deep Semantic Parsing and Knowledge Graphs. Informatics, 13(5), 66. https://doi.org/10.3390/informatics13050066

