1. Introduction
Under the guidance of the 14th Five-Year Plan, China’s home appliance industry has achieved innovation-driven development, with remarkable progress in intelligent manufacturing and green development. The product structure has been continuously optimized, and the pace of brand and intelligent transformation has significantly accelerated. However, despite China’s position as the global manufacturing hub for home appliances, traditional home appliance enterprises and appliance suppliers still face challenges such as low collaboration efficiency, information delay, and slow response times. The lack of backend demand assessment further constrains the overall improvement of industry efficiency. In response to the trend of servitization in manufacturing, home appliance enterprises are gradually shifting their focus from production to after-sales services. Fault repair, as a core component of the customer experience, has increasingly become a focal point for stakeholders. Rapid and accurate fault diagnosis not only shortens repair cycles but also significantly enhances customer satisfaction, injecting strong momentum into enterprise transformation.
Traditional fault diagnosis of home appliances primarily depends on the experience and intuition of maintenance technicians. Faults are typically identified through methods such as observation, sound analysis, and measurement, while a segmented isolation approach is employed for troubleshooting complex systems. Users, in contrast, rely on manuals and flowcharts provided by manufacturers for self-diagnosis [
1]. However, these conventional methods face significant limitations due to the complexity and diversity of fault types, as well as delays inherent in the diagnostic process. These challenges are particularly evident in the context of complex home appliance systems. Knowledge graphs, as a graphical knowledge representation approach based on relational databases, offer a structured system that integrates multi-source information, including appliance types, fault causes, and solutions. This technology can assist professional maintenance personnel in quickly diagnosing fault problems, thereby improving diagnostic efficiency and accuracy. Consequently, the construction of knowledge graphs for home appliance faults has emerged as a critical priority for advancing the industry.
The construction of knowledge graphs for home appliance faults presents significant challenges, including difficulties in acquiring domain-specific data and the complexity of knowledge extraction. Due to the scarcity of publicly available datasets in this field, small-scale domain datasets with manual annotations are often the only viable option. Analysis of the data in this domain reveals that samples are primarily sourced from maintenance manuals and forums. However, these sources often suffer from non-standard terminology and colloquial expressions, leading to frequent occurrences of synonymy. Additionally, the data contains a substantial number of discontinuous entities and nested entities. As illustrated in
Figure 1, the term “drainpipe” is a primary entity, but it is associated with two distinct phenomena, “rupture” and “crack”, which exemplify discontinuous entities often found in descriptions of fault phenomena. Nested entities are particularly prominent in the relationship between components and fault phenomena, such as the nested structure of “indoor unit” and “indoor unit water leakage”. Non-standard terminology exacerbates the issue of synonyms with different forms, creating knowledge redundancy. For instance, terms like “temperature controller”, “thermostat”, and “temperature control component” all refer to the same concept but make it challenging to define clear entity boundaries.
In existing research, knowledge graph construction methods predominantly rely on deep learning for knowledge extraction tasks [
2]. Deep learning, particularly in named entity recognition (NER) tasks, leverages pre-trained models for transfer learning. This approach allows for rapid adaptation to new domains using annotated datasets, offering advantages such as low training costs and strong domain adaptability. Large language models (LLMs), on the other hand, excel in relationship extraction tasks due to their superior contextual understanding and generalization capabilities [
3]. In this study, a dataset for the home appliance fault domain was constructed using manually annotated maintenance records as the raw data. The efficiency of deep learning was utilized for NER to perform entity extraction. Subsequently, the semantic processing strengths of LLMs were applied to address the challenges of poor association capability, low extraction accuracy, and weak generalization in entity association and relationship extraction tasks. A fine-tuning strategy was designed for LLM-based association and extraction, which effectively enhanced model performance. This approach also improved the interpretability and transparency of the model’s decision-making process. The constructed knowledge graph for the home appliance fault domain was visualized using Neo4j (Version: Community Edition 3.5.28). By uncovering relationships between entities in the domain, the knowledge graph provides valuable support for both users and technicians, enabling faster and more informed decision-making.
In summary, the main contributions of this study are as follows:
- (1)
Ontology design and dataset construction for the home appliance fault domain: Using maintenance records and user manual data as examples, this study adopts a top-down approach to design the ontology for the home appliance fault knowledge graph. A manually annotated domain-specific dataset was constructed to facilitate the acquisition of information at the data layer of the knowledge graph.
- (2)
Proposing a dual-strategy automated knowledge extraction method: A novel approach is introduced to extract knowledge from heterogeneous and multi-source home appliance fault datasets. For the NER task, multi-head attention mechanisms and a Chinese pre-trained model are employed to address challenges such as the ambiguity of entity boundaries. For ontology association and relationship extraction tasks, a progressive LLM-based knowledge extraction method is developed, which integrates knowledge fusion to generate the final set of triplets. The combination of these two strategies enhances the efficiency of knowledge extraction, providing a practical pathway for the integration and standardization of knowledge in the home appliance domain.
Transforming textual fault repair data into a structured knowledge graph: Through ontology construction and automated knowledge extraction, this study successfully converts fault repair case texts into a clear and comprehensible knowledge graph. This will help to transform unstructured text into a structured representation. And the constructed graph can help maintenance personnel to quickly diagnose faults, thereby improving the efficiency of fault diagnosis.
2. Background
The concept of knowledge graphs was first introduced by Google in 2012 in a paper titled “Knowledge Graph: Things, Not Strings”, with the goal of enhancing the understanding and information organization capabilities of search engines [
4]. At its core, a knowledge graph is a structured semantic knowledge base that describes concepts in the physical world and their interrelations through symbolic representations [
5]. A knowledge graph is composed of multiple “entity-relationship-entity” triplets, along with entity attributes and values. Due to its structured data, depth in semantic understanding, and inherent inferential capabilities, knowledge graphs have found widespread applications across various domains, including healthcare, automotive, and water resources [
6,
7,
8], as well as in intelligent question answering [
9] and decision support systems [
10]. In the field of fault diagnosis, existing research has demonstrated the potential of knowledge graphs in data correlation analysis. Tang et al. [
11] constructed a knowledge graph for aircraft fault diagnosis by implementing a Start Position and Label-knowledge Enhanced Representation (SP-LEAR) entity recognition layer combined with a BERT-Convolution-Pooling relation extraction layer. Nevertheless, the effectiveness of short sentence entity recognition remains limited when dealing with entities that have vague boundaries. Liu et al. [
12] embedded knowledge graphs into large language models to construct a knowledge-enhanced joint model. This model incorporated subgraph embedding learning and supplemented professional domain knowledge to facilitate fault diagnosis in the aviation assembly process. The training data only contains 200 fault cases, and although the accuracy is high, small samples may mask the performance degradation of the model in larger or more complex scenarios. Peng et al. [
13] proposed a multimodal knowledge graph (MKG) construction approach that incorporates time series vibration signals, spectral data, and descriptive text from datasets to fully capture the features and interrelationships of bearing faults. Furthermore, they designed a fault diagnosis method utilizing a Relation Cascade Graph Attention Network (RC-GAT)-based MKG completion model, achieving efficient and accurate diagnosis of bearing faults. However, its application is confined solely to fault diagnosis within the bearing industry. Therefore, it is domain-dependent.
Traditional knowledge mapping methods mostly rely on rule-based methods and deep learning algorithms, but they still have limitations when dealing with fuzzy entity boundaries, small amounts of data, and large-scale and diversified data in specific fields. Rule-based approaches typically require deep involvement from domain experts. Since they depend on the prior definition of domain knowledge, these methods struggle to adapt to dynamic scenarios and unseen new entities and relationships. Furthermore, while traditional deep learning methods have made progress in tasks such as named entity recognition and relationship extraction, the models’ generalization capabilities remain limited. This is particularly evident when dealing with complex domain knowledge, where there is often a trade-off between accuracy and efficiency. For instance, Norabid et al. [
14] developed a triple extractor by formulating domain-agnostic entity-relation extraction rules based on dependency relations and part-of-speech (POS) information. They also proposed a multimodal knowledge graph (MKG) to extract knowledge from the unstructured text surrounding web images. Ge et al. [
15] integrated remote sensing, geographic information, and expert knowledge by utilizing a common semantic ontology and a unified spatio-temporal framework. They combined these with multi-aspect relevant data from remote sensing technology for disaster analysis. Consequently, they constructed a knowledge graph and represented the disaster prediction model in the form of knowledge formulations. This approach facilitates the integration of multi-source spatio-temporal data and enhances the effectiveness of disaster prediction. While these methods can construct knowledge graphs in specific domains to some extent, they are limited by their strong data dependency and poor generalization capabilities [
16]. These methods require substantial data annotation, and high-quality annotated data is not easily accessible in many domains [
17].
With the development of deep learning technologies, pre-trained language models (such as BERT, Turing NLG, etc.) have demonstrated their powerful capabilities in natural language processing (NLP) tasks [
18,
19,
20]. These pre-trained models capture common language representations from massive texts through self-supervised learning, providing a transferable semantic understanding basis for downstream tasks. It also provides help for the scarcity of data in specific fields when building the task of a knowledge map. In recent years, LLMs such as ChatGPT-4o, ChatGLM4, and Qwen-7B-Chat, through large-scale pre-training and fine-tuning, have not only achieved significant results in general NLP tasks but have also shown potential in domain-specific applications. This provides a new approach to knowledge graph construction [
21]. For instance, Li et al. [
22] developed an intelligent compliance checking method for construction schemes by integrating knowledge graphs and LLM. They built a multi-dimensional, multi-granular knowledge graph to support domain-specific knowledge for the LLM and introduced a parsing module using text classification and entity extraction. This facilitated effective domain knowledge integration and enhanced the application of LLMs and knowledge graphs in construction industry text compliance checks. Liu et al. [
10] embedded knowledge graphs into large language models to create a knowledge-enhanced joint model, utilizing the graph structure of large-scale data in knowledge graphs. By further optimizing the LLM, the computational burden was alleviated, leading to successful fault diagnosis in the aerospace equipment field. LLMs, with their advantages of not requiring extensive labeled data and strong generalization capabilities, can overcome the limitations of traditional knowledge extraction methods. However, their performance in handling entities with complex boundaries is suboptimal, and the design of prompts significantly influences the quality of model output.
Therefore, aiming at the problems of fuzzy entity boundary, entity nesting, and small datasets in the field of household appliance fault data, this paper designs a set of models. The model combines the high domain adaptability of deep learning in the task of named entity recognition and the strong semantic understanding ability of LLMs, uses the top-down method to construct the fault knowledge map of household appliances, and then designs a progressive prompt method to solve the challenges faced by deep learning, such as the need for high-quality data samples, the weak ability to deal with complex relationships, and the limited ability to understand the context. This method provides an idea for the construction of a knowledge map in the field of fault diagnosis in the field of household appliances. It also lays a foundation for the design of intelligent question answering and intelligent diagnosis applications based on a household appliance fault knowledge map, thus promoting the service-oriented transformation of the household appliance manufacturing industry.
3. Methodology
The construction methods of knowledge graphs are divided into “bottom-up approaches” suitable for general knowledge graphs with wide coverage and “top-down approaches” for domain-specific knowledge graphs that describe concepts and relationships within a particular domain [
23]. As home appliance faults represent a typical domain-specific knowledge graph, this paper adopts a top-down construction approach and utilizes a semi-automated ontology construction method [
24] to build the home appliance fault domain ontology. The construction of the home appliance fault knowledge graph involves three main layers: the data layer, extraction layer, and the construction layer. The overall framework is shown in
Figure 2.
The first step is the construction of the data layer of the knowledge graph. The data layer is primarily used to obtain maintenance records and repair methods from various home appliance fault forums and repair manuals. Afterward, data collection, data cleaning, and other preprocessing tasks are performed. Expert knowledge is incorporated to construct the ontology of the home appliance fault domain, refining the definitions of entities, attributes, and relationships.
The second step is the extraction layer of the knowledge graph. The work of the extraction layer mainly includes NER, ontology linking, and relationship extraction. First, a deep learning approach is used to compare and select the final model from pre-trained models. The NER task is then implemented on the selected model. After extracting the entities, a progressive prompting strategy is designed to link entities with subjects and match relationships between entities using LLMs. The extracted information is saved in the form of triples for subsequent visualization.
The third step involves the visualization of the knowledge graph based on the results of the extraction layer. The construction layer is mainly responsible for displaying the triple-format data obtained from the extraction layer in a visual manner, forming the final home appliance fault knowledge graph. Before visualization, similarity calculations are performed on the extracted triples to merge semantically similar concepts, avoiding redundancy that could affect the quality of the graph. After knowledge fusion and processing, the knowledge graph is visualized using Neo4j.
3.1. Data Layer Construction
3.1.1. Ontology Construction
The ontology of a knowledge graph is a core concept within the graph. It defines the entities, attributes, and relationships in the graph, as well as the semantic relationships between them. It serves as a standardized representation of the knowledge structure within a specific domain [
25]. In this study, Protégé was used as the ontology construction tool, and the domain knowledge of home appliance faults was utilized to represent the knowledge of home appliance repair cases. During the construction process, a top-down ontology construction approach was adopted under the guidance of professional repair personnel’s logs and experience. The repair case texts were manually analyzed and organized according to the “entity-relationship-entity attribute” or “entity-relationship-entity” three-element structure. The ontology consists of five elements: appliance category, fault cause, components, fault phenomenon, and solution method, along with the relationships between them. After the ontology was constructed, professional repair personnel were invited to evaluate and revise the framework to finalize the ontology structure.
Table 1 presents the “entity-relationship-entity” semantic relationship table of the home appliance fault knowledge graph, facilitating a clear presentation of the semantic relationships between entities.
3.1.2. Data Preprocessing
In the field of home appliance fault repair, there is currently a lack of publicly available professional datasets. Most home appliance repair cases are scattered across various repair forums and appliance manuals. To gather sufficient initial data for model training, this study utilizes web scraping to collect textual corpora from repair forums and manuals. The textual data obtained through web scraping contains some issues, such as certain web pages storing data in image formats and irrelevant text content not related to home appliances. Therefore, data cleaning is necessary. The cleaning process includes removing unrelated data, filling in missing values, deleting line breaks and spaces, handling missing values, and other operations to correct anomalies in the original text. Following this, duplicate data is identified and removed through a redundancy check, and the remaining corpora are then normalized into a standardized format for subsequent manual annotation and other processing steps.
3.1.3. Corpus Annotation
In order to enable the model to better understand and process the data to meet the needs of different scenarios and tasks, the preprocessed data needs to undergo manual corpus annotation. During the annotation process, entities in the text are labeled according to the constructed ontology. This study adopts the “BIO” tagging scheme for dataset annotation. In this scheme, “B-” represents the beginning of an entity, “I-” indicates the internal part of an entity, and “O-” denotes non-entity characters. For example, given the fault text “The water heater cannot ignite or the ignition is unstable, which may be caused by a damaged or dirty ignition electrode preventing normal ignition. Cleaning or replacing the ignition electrode can solve the issue,” the annotation example is shown in
Table 2.
3.2. Deep Learning-Based Named Entity Recognition
Named Entity Recognition is the process of identifying entities with specific meanings from text and classifying them into predefined categories. It is the first step in constructing a home appliance fault knowledge graph. For the NER task, this paper builds the RoBERTa-zh-BiLSTM-Attn-CRF model, which consists of the RoBERTa-zh pretraining layer, BiLSTM-Attn layer, and the CRF layer. The input corpus text first undergoes encoding through the RoBERTa-zh pretraining layer, then passes through a bidirectional Long Short-Term Memory network (BiLSTM) to enhance the contextual representation of the embedded word vectors, combined with multi-head attention to capture complex dependencies. Finally, it enters the CRF layer for optimizing the prediction results of the model’s sequence labels. The model architecture is shown in
Figure 3.
3.2.1. RoBERTa-zh Pre-Training Layer
RoBERTa is a pre-trained model designed for natural language processing tasks, and RoBERTa-zh is its version specially optimized for Chinese. Its framework is an improvement based on the BERT model. Compared to the BERT model, RoBERTa-zh no longer uses the Next Sentence Prediction (NSP) task as a pretraining objective but instead relies on the Masked Language Model (MLM) task. During training, it uses larger batch sizes, longer sequences, and a larger dataset, including CommonCrawl and OpenWebText, as well as more pretraining steps, making it a “robust version” of BERT.
By using RoBERTa-zh, high-quality word embedding vectors containing rich contextual information can be generated, which helps to improve entity recognition performance in home appliance fault domain texts. An example of how RoBERTa-zh processes the original text is shown in
Figure 4. First, it splits the input sentence into individual characters, where the set is defined as
,
n. with each
representing the i character in the sentence. Then, a classification token “[CLS]” is added at the beginning of the sentence, and a separator token “[SEP]” is added at the end of the sentence, transforming the original sentence set into an input vector set
. The input vector set
is then processed by multiple layers of Transformer encoders to produce the output vector set
.
3.2.2. Bidirectional Long Short-Term Memory Network
After the original text is processed by the pre-training layer, it is transformed into a vector with semantic representations, which is then passed into the Bidirectional Long Short-Term Memory (BiLSTM) network. The BiLSTM consists of a forward LSTM unit and a backward LSTM unit. Each LSTM unit is composed of an input gate, output gate, forget gate, and memory cell. The hidden state from the previous time step and the input vector at the current time step are first input into the forget gate to determine the information to be forgotten, as shown in the following formula:
In the above formula, the dimensions of
are
, the dimensions of
are
,
represents the weight parameter with dimensions
, and
is the bias parameter with dimensions
.
represents the
activation function, whose formula is as follows:
Next, the memory gate is calculated to select the information to be remembered. The formula is as follows:
In the formula,
represents the candidate memory cell, which is used to store information and overcome the vanishing or exploding gradient problems faced by traditional RNNs when processing long-term sequence data. The activation function
used here is
After the memory gate, the current cell state needs to be calculated. The formula is as follows:
In the formula,
represents the cell state at the current time step. Then, the output gate and the update function for the hidden state at the current time step are computed, ultimately resulting in a hidden state sequence with the same length as the sentence.
The BiLSTM combines gated mechanisms with memory cells to facilitate the transfer of information, enabling it to effectively capture bidirectional contextual information. This results in a richer context representation, while also addressing the gradient vanishing or explosion problems encountered in traditional RNN models with long sequences. Additionally, the BiLSTM generates a hidden state representation for each input position that integrates comprehensive contextual information.
3.2.3. Multi-Head Attention Mechanism
To better handle long-range texts with dependencies, such as those in appliance failure repairs, this study integrates the multi-head attention mechanism into the model to enhance its understanding of the relationships between different parts of the input text. Multi-Head Attention, based on the Transformer model, uses multiple parallel independent attention mechanisms to capture attention distributions of the input sequence across different subspaces. This approach enables a more comprehensive capture of various potential semantic associations embedded within the sequence. In multi-head attention, each input word is transformed into three vectors: the Query (
), Key (
), and Value (
) vectors. The similarity between the Query and all Key vectors is then computed, yielding an attention score matrix that represents the degree of attention each word pays to other words. Typically, a softmax function is used to normalize the attention scores for probabilistic representation, and the Value vectors are weighted and summed to obtain the optimal output word.
In the formula, represents the word embedding matrix of the input sequence, and represent the transformation matrices for the attention head. is the dimension of the key vectors, used for scaling the dot product to prevent gradient vanishing or explosion. is a linear transformation matrix used to map the concatenated vectors back to the required output dimension, while denotes the number of attention heads. The final output vector is represented by .
3.2.4. Conditional Random Fields
Finally, a Conditional Random Field (CRF) layer is applied to consider the transition probabilities between labels, optimizing the label prediction results for the entire text sequence. This ensures that the prediction is globally optimal, avoiding the issues that arise from local optima. CRFs are used to capture the transition probabilities between sequence labels. Due to the strong dependencies between labels, the CRF layer can capture the specific order and constraints between labels, selecting the most suitable label sequence during prediction. After the feature vector sequence
is processed by multi-head attention, where
represents the sentence length and
represents the hidden state vector at time step
, the CRF finds the most likely sequence
given
, as shown in the following formula:
In the formula, denotes the scoring function used to evaluate the score of a label sequence. represents the normalization factor that ensures the scores form a valid probability distribution. indicates the transition score from label to label is the inner product of the hidden state and the corresponding row vector , representing the score of label given the hidden state . Each element in the transition matrix represents the transition score from label to label .
3.3. Entity Linking and Relation Extraction Based on LLMs
For traditional approaches using deep learning for entity linking and relation extraction tasks, several limitations exist. On the one hand, the lack of mature, large-scale fault datasets in the home appliance fault domain necessitates the use of extensive annotated data to train models for these tasks. Furthermore, such datasets must be meticulously designed and annotated for specific tasks, leading to high costs. On the other hand, the generalization ability of models is constrained by the diversity, quantity, and quality of training data. This limitation results in degraded performance when models encounter new domains or unseen data. Additionally, when dealing with different types of entities and relations, it may be necessary to retrain models or adjust parameters, thereby limiting their versatility and flexibility.
In contrast, methods based on LLMs offer significant advantages in relation extraction tasks. These advantages stem from the fact that LLMs are pre-trained through unsupervised or weakly supervised learning approaches, which greatly reduces their dependence on annotated data. LLMs can learn rich linguistic knowledge and patterns from large volumes of unannotated text. Furthermore, because LLMs are pre-trained on massive and diverse multi-source datasets, they exhibit strong generalization capabilities, making them well-suited for tasks across various domains and scenarios. Even with limited annotated data, LLMs can achieve outstanding performance in relation extraction tasks. Additionally, by leveraging fine-tuning or prompt-based techniques, LLMs can quickly adapt to new task requirements without requiring structural modifications to the model.
For the specialized domain of household appliance fault data, this study employs the ChatGLM4-plus model to perform entity linking and relation extraction tasks. First, the annotated dataset is used to establish a local knowledge base. Then, the recognized entity list, entity categories, and relationship types are provided as input data. Based on experimental requirements, different prompts are designed to guide the model in accomplishing entity linking and relation extraction tasks effectively.
In terms of prompt design, this study adopts a progressive strategy to handle tasks incrementally through three steps: initializing the entity list, ontology linking, and relation extraction. Prior to issuing task-specific prompts, the process begins with initializing the entity list, which involves identifying entities through named entity recognition. Next, ontology linking is performed. The ontology linking phase starts with a task-specific prompt to ensure that the model has a clear understanding of the task at hand. Subsequently, specific input requirements are introduced, including domain restrictions and illustrative examples. Providing examples helps the model better learn and comprehend the requirements, thereby accelerating its understanding and improving its generalization ability. Finally, output standards are defined to ensure that the results are structured in formats such as JSON or HTML, facilitating further processing. After ontology linking, an additional prompt is issued based on the processed data to complete the relation extraction task. The detailed steps are illustrated in
Figure 5.
3.4. Knowledge Fusion
To address the issues of entity nesting or semantic redundancy commonly observed in entities extracted from unstructured appliance fault repair cases, knowledge fusion methods are employed to merge redundant fault knowledge. This approach ensures the construction of a higher-quality knowledge graph. Knowledge fusion refers to the integration of knowledge from diverse sources and forms, encompassing identical entities across different knowledge bases, multiple distinct knowledge graphs, and multi-source heterogeneous external knowledge. Its primary objective is to identify and reconcile equivalent instances, categories, and attributes within knowledge graphs, thereby enabling the updating and refinement of existing knowledge graphs. Key tasks in this process include coreference resolution and entity disambiguation.
Coreference resolution involves merging entities in the text that refer to the same real-world object, addressing issues such as synonyms and nested entities. For instance, in the entities extracted from appliance-related texts, terms like “heating tube” and “heat pipe” both refer to the same entity, “electric heating tube”, despite using different names. This inconsistency arises because appliance fault datasets are often derived from human-written or spoken descriptions, lacking standardization and professional terminology. Conducting coreference resolution experiments helps to standardize the knowledge graph and enhance its consistency.
Entity disambiguation refers to resolving the ambiguity and vagueness of entity names in a text, ensuring that each entity name accurately corresponds to its intended object. This process primarily addresses the issue of homonyms—entities with the same name but different meanings. However, since the source data in this study is entirely derived from the appliance fault domain, such issues are largely absent.
This study employs the Chinese Word Vectors 0.2.0 released by Tencent AI Lab to represent entities as word vectors. Compared to other models, this word vector model offers broader coverage and higher accuracy. Entities are merged by calculating the cosine similarity between their vectors and applying a threshold, with entities exceeding the threshold being fused. Cosine similarity is a commonly used metric for measuring the angle between two non-zero vectors, assessing their directional similarity in a multidimensional space. In natural language processing, particularly in applications involving word vectors, cosine similarity is frequently utilized to evaluate the semantic closeness of two words. The formula is as follows:
where
and
represent the entities,
and
denote their respective values in the
dimension.
When the directions of the two word vectors are perfectly aligned (i.e., the angle between them is 0 degrees), the cosine similarity equals 1.
When the directions are completely opposite (i.e., the angle is 180 degrees), the cosine similarity equals −1.
When the two word vectors are orthogonal (i.e., the angle is 90 degrees), the cosine similarity equals 0.
When the directions of the two word vectors are partially similar but not completely aligned (i.e., the angle between them is between 0 and 90 degrees), the cosine similarity value falls between 0 and 1, indicating a certain degree of similarity between them.
The computed similarity results for entities are shown in
Table 3.
4. Experiments
4.1. Data Preparation
The data in this study was primarily collected through a Python web scraper from home appliance repair forums, including popular platforms such as “Home Appliance Maintenance Information Network” and “Home Appliance Maintenance Technology Forum”. The initial dataset comprised 5496 entries. To ensure the authenticity and reliability of the data, the collection process followed these steps:
We targeted forums and communities where professional technicians and users frequently share repair cases. These platforms were chosen for their high activity levels and the technical depth of discussions.
Using Python libraries like BeautifulSoup and Scrapy, we extracted textual repair cases, including fault descriptions, causes, and solutions.
Automated scripts removed duplicate entries, irrelevant content such as advertisements, non-repair-related posts, and incomplete cases such as missing fault descriptions or solutions. Special characters, HTML tags, and emojis were also filtered out.
To ensure case authenticity, two domain experts independently reviewed a random subset of 1000 entries. They verified the technical accuracy of fault descriptions and solutions, flagging ambiguous or unrealistic cases such as “the refrigerator exploded due to overcharging”. Discrepancies were resolved through discussion, and low-quality entries were discarded. This step resulted in the removal of 12% of the initial dataset.
After cleaning and verification, the dataset consisted of 3156 high-quality repair cases. Each case was manually annotated, categorizing entities into five types: appliance category, components, fault cause, fault phenomenon, and solution method. The dataset was split into training, testing, and validation sets in an 8:1:1 ratio for subsequent model experiments.
Table 4 provides examples of some entity categories.
4.2. Model Evaluation
To validate the reliability of the model, this study selects Precision (
), Recall (
), and F1-score (
) as evaluation metrics. The formulas for the three evaluation metrics are as follows:
In the formulas, (True Positive) represents the number of positive samples correctly identified as positive, (False Positive) represents the number of negative samples incorrectly identified as positive, and (False Negative) represents the number of positive samples that were not recognized.
4.3. Experimental Environment and Parameters
The experiments in this study were conducted on a GPU with 12 GB of memory. The programming language used in the experiments was Python 3.9.0, and the CUDA version was 12.5. The parameters for the Named Entity Recognition experiment, including the RoBERTa-BiLSTM-Attn-CRF model and the fine-tuning settings for ChatGPT-4-flash, are listed in
Table 5.
4.4. Experimental Results and Analysis
In this experiment, models such as Bert-BiLSTM, Bert-BiLSTM-CRF, RoBERTa-zh-IDCNN, and RoBERTa-zh-IDCNN-CRF are introduced for comparative experiments. The table below shows the recognition results of each model in the entity recognition task.
From
Table 6, we can observe that the proposed RoBERTa-zh-BiLSTM-Attn-CRF model achieves a precision of 96.64% and an F1-score of 93.23% on the dataset. Compared to the basic Bert-BiLSTM model, significant improvements are seen after incorporating RoBERTa-zh pretraining, multi-head attention, and the CRF layer. First, RoBERTa-zh pretraining, compared to Bert, better handles the unique linguistic phenomena and grammatical structures of Chinese. Additionally, the dynamic masking strategy introduced in RoBERTa-zh improves the model’s contextual understanding, compared to Bert’s fixed masking. When compared to the RoBERTa-zh-BiLSTM-CRF model, the addition of multi-head attention leads to improvements in all three evaluation metrics. This is because the BiLSTM encounters the gradient vanishing problem when processing long-range dependencies in sentences, while the multi-head attention mechanism can directly connect distant positions in the sequence. Furthermore, due to its parallel processing ability, multi-head attention speeds up the training process by performing computations across multiple attention heads simultaneously.
In the ontology linking and relation extraction section, this paper selects the Bert-BiLSTM-CRF model as the baseline for relation extraction experiments. Additionally, comparative experiments are conducted by incorporating large language models such as ChatGLM3, ChatGPT-3.5-turbo, ChatGPT-4o, ChatGLM-4-flash, and ChatGLM-4-plus. The following table presents the recognition results of each model in the relation extraction task.
From
Table 7, we can see that compared to ChatGLM3 and ChatGPT-3.5-turbo, both models of ChatGLM-4 show improvements in F1-score, accuracy, and recall rate. Specifically, ChatGLM4-plus achieved a 31.54%, 41.04%, and 18.01% improvement in F1-score, accuracy, and recall, respectively, over ChatGLM3. By analyzing the relation extraction results, it is evident that the output from ChatGLM3 contains a large amount of redundant data. For example, in the extraction process, entities such as “outside unit”, “compressor malfunction”, “vibration”, and “failure to start” were mentioned. However, the original sentence did not involve the relationships between “compressor malfunction” and “failure to start”. The result returned by ChatGLM3 erroneously associated these two entities. This is due to the combinatory pattern of triples in ChatGLM3, which leads to numerous redundant and unrealistic associations in the returned data. Consequently, this resulted in poor performance across the three evaluation metrics. This phenomenon reflects the issue of fidelity hallucination in the ChatGLM3 model when processing appliance fault data. In contrast, other models overcame this limitation.
Additionally, the performance of large language models in extracting “appliance types” is relatively poor, such as with entities like “range hood”, “exhaust fan”, and “washing machine”, “Haier washing machine”. This is primarily because many appliance names in appliance fault repair cases are recorded in colloquial terms or include brand names. When faced with these non-standardized or brand-nested appliance types, the large models tend to extract all possible entities, resulting in many instances of entities with different names but the same meaning within the “appliance type” category. As a result, the extraction of “appliance types” performs worse compared to other tasks.
Overall, both the Bert-BiLSTM-CRF and ChatGLM4-plus models demonstrate strong performance in relation extraction, with F1-scores of 81.34% and 86.33%, respectively, both exceeding 80%. Additionally, their accuracy and recall rates are also at a high level. Among these, ChatGLM4-plus has more parameters and has absorbed more knowledge during training, resulting in significantly better language understanding, long text processing, and reasoning capabilities compared to ChatGLM4-flash and ChatGLM3. This makes it more suitable for tasks requiring high precision and complexity. In Chinese language tasks, it clearly outperforms ChatGPT-3.5-turbo, indicating the strong advantages of domestic large language models in processing Chinese text. Ultimately, the ChatGLM4-plus model was selected for relation extraction on a dataset of 3156 appliance fault repair segments to construct a knowledge graph.
4.5. Computational Complexity and Hardware Resources
Computational complexity and hardware resource requirements of the proposed model are critical factors in evaluating its practicality for real-world applications. The trade-off between computational cost and performance highlights the importance of balancing model complexity with practical deployment constraints, especially in resource-limited environments. The RoBERTa-zh-Bilstm-Attn-CRF model involves multiple layers of computation, including pre-training with RoBERTa-zh, bidirectional lstm processing, multi-head attention mechanisms, and CRF-based sequence optimization. The pre-training layer, which utilizes RoBERTa-zh, is the most computationally intensive component due to its Transformer architecture and large parameter size. Training this model on a GPU with 12 GB of memory required approximately 8 h for 10 epochs, with each epoch processing the dataset of 3156 entries in around 48 min. The memory usage peaked at 10.5 GB during training, primarily due to the storage of intermediate attention weights and gradient updates. For the LLM-based relation extraction tasks, the ChatGLM4-plus model demonstrated higher computational demands compared to traditional deep learning models. Each inference step for relation extraction consumed an average of 3.2 s on the same GPU, with memory usage reaching 9 GB. This is attributed to the model’s large-scale parameters and the need for extensive context processing during progressive prompting. Despite these requirements, the model’s ability to generalize with limited annotated data offsets the computational costs by reducing the need for extensive manual labeling.
To further evaluate the efficiency of each model, we compared the average running time of different models in entity recognition and relationship extraction tasks in a single batch (batch size = 16) under the same hardware environment (12 GB GPU). The results are shown in
Table 8:
From the above table, it can be seen that the RoBERTa-zh pre-training and attention mechanism resulted in a 62% increase in single batch processing time from 0.42 s to 6.8 s, but in exchange for a 3.87% increase in F1- score (
Table 6). Although ChatGLM4 plus has the longest inference time of about 3.15 s per batch, its F1- score has increased by 4.99% compared to traditional models, from 81.34% to 86.33%. The model performance and inference time show non-linear growth: ChatGLM4 plus is only 22% slower than ChatGPT-4o, but the accuracy is improved by 18.3% (
Table 7).
To optimize resource utilization, future work could explore techniques such as model quantization, gradient checkpointing, or distributed training to reduce memory overhead and accelerate inference. Additionally, deploying the model on specialized hardware or cloud-based platforms could further enhance scalability for applications.
4.6. Data Dependency Analysis
To evaluate the model’s dependency on training data scale and its generalization capability, we conducted additional experiments by training the RoBERTa-zh-Bilstm-Attn-CRF model on subsets of the original dataset (30%, 50%, and 80% of the training data). We measured the F1-score (
), precision (
), and recall (
) on the fixed test set. The results are shown in
Figure 6.
The experimental results demonstrate the model’s strong robustness and generalization capabilities under varying data conditions. Notably, even when trained with only 50% of the original dataset, the model maintains a competitive F1 of 90.86%, representing merely a 2.37% performance degradation compared to full-data training, which underscores its effectiveness in learning from limited annotated samples. From
Figure 6, we can see that with the reduction in training data, the accuracy shows a more gradual degradation than the recall rate, which indicates that the model tends to maintain the accuracy of prediction under the condition of full coverage in low resource scenarios. These findings have important practical implications, indicating that the model remains viable for deployment in annotation-sparse environments, with a recommended minimum of 50% training data to preserve F1-score above 90%. Furthermore, the model demonstrates notable resilience to label noise, as evidenced by only a 2.18% F1-score drop when subjected to 10% randomly flipped training annotations, achieving a final performance of 91.05%. This comprehensive evaluation confirms the model’s robustness to both data scarcity and annotation imperfections.
4.7. Knowledge Graph Visualization
After the experiments on the extraction layer, a set of triples containing multiple entities and relationships was obtained. The triples were then cleaned and underwent anaphora resolution. The results of the entities after cleaning and anaphora resolution are shown in
Table 9.
In order to clearly present the relationships of each triplet, this study uses Neo4j for visualization. Compared to traditional databases, Neo4j allows the representation of nodes and the relationships between them in a graph format. Additionally, it supports both graphical user interface tools and the Cypher command-line tool, which facilitate database operations.
Figure 7 below shows a partial visualization of the results.
By performing information mining on the constructed appliance failure knowledge graph, it can be observed that the top five most frequently malfunctioning appliance types are washing machines, air conditioners, refrigerators, gas water heaters, and televisions. The most common failure symptoms associated with these appliance types are the washing machine not spinning during the dehydration cycle, the weak cooling effect of air conditioners, poor cooling performance of refrigerators, intermittent hot and cold water in gas water heaters, and no sound in televisions. Based on these commonly occurring appliance types and failure symptoms, the respective appliance manufacturers can take targeted measures. For instance, manufacturers of gas water heaters could innovate in the gas valve and thermostat components to enhance product quality and improve customer satisfaction.
4.8. Application of Knowledge Graph
The home appliance fault knowledge graph integrates unstructured fault descriptions into a structured network containing fault phenomena, causes, and solutions. Leveraging its powerful semantic association network, it not only supports graph visualization but also understands natural language queries. It returns related entities and relationships to help technicians quickly diagnose faults.
Taking the refrigerator fault description “The refrigeration room is not cooling and the compressor frequently starts and stops” as an example, when the natural language query “The refrigeration room is not cooling and the compressor frequently starts and stops” is input into the model, the system first performs entity recognition, extracting the entity mentions “The refrigeration room is not cooling” and “the compressor frequently starts and stops”. Next, entity linking maps these to standardized terms such as “Refrigeration function failure” and “Abnormal cycle of compressor” along with other candidate relationships. Through intent classification, the relationship “fault cause” is identified. The semantically parsed information is then converted into a Cypher query to retrieve the knowledge graph. The results and graph visualization are presented as output. In this case, the fault causes “Refrigerant leakage” and “Sensor malfunction” are identified as common reasons for both “Refrigeration function failure” and “Abnormal cycle of compressor”. This information is displayed in an intuitive and easy-to-understand visual format. The application process and visualization effects are illustrated in
Figure 8.
5. Conclusions and Discussion
This paper proposes a dual-strategy progressive knowledge extraction method for constructing a home appliance fault knowledge graph, effectively addressing challenges such as complex text formats and severe entity nesting in fault repair cases. By developing the RoBERTa-zh-BiLSTM-Attn-CRF model, we leverage the maturity and efficiency of machine learning in NER tasks to accomplish entity extraction. Meanwhile, we utilize the high semantic awareness and strong generalization capabilities of LLMs to achieve knowledge extraction through progressive prompt design, and employ Neo4j for visualization, realizing the automated construction of a home appliance fault knowledge graph.
Experimental results demonstrate that the proposed RoBERTa-zh-BiLSTM-Attn-CRF model achieves high precision and recall in NER tasks, successfully identifying entities from unstructured fault repair cases while effectively resolving issues such as ambiguous entity boundaries and nested entities. Additionally, LLMs exhibit promising performance in relation extraction tasks, with the ChatGLM4-plus model outperforming traditional relation extraction models. This approach reduces reliance on extensive data annotation and high-quality labeled data, thereby lowering annotation costs and improving generalization in relation extraction. These findings provide a novel solution for constructing knowledge graphs in the home appliance fault domain.
Compared to existing methods, the proposed dual-strategy model demonstrates significant advantages. Unlike traditional rule-based approaches, our method eliminates dependence on manually defined rules. Furthermore, compared to conventional end-to-end deep learning methods, the introduction of LLMs reduces the demand for large-scale annotated data. Specifically, our approach improves the F1-score in relation extraction by 4.99 percentage points over the traditional Bert-BiLSTM-CRF model, representing a notable practical advancement.
However, this study has certain limitations. First, the model’s accuracy may decline when processing highly colloquial or non-standard texts, particularly in extracting entities such as “solution.” Second, the RoBERTa-zh-BiLSTM-Attn-CRF model and ChatGLM4-plus require substantial computational resources, which may hinder deployment in resource-constrained environments. Additionally, despite its strong performance, ChatGLM4-plus occasionally generates redundant or incorrect relations due to the hallucination phenomenon common in LLMs.
Future research will focus on the following directions: (1) exploring subgraph embedding or multimodal knowledge graph algorithms to improve entity disambiguation and relation extraction accuracy; (2) investigating data augmentation techniques to expand domain-specific datasets and enhance model robustness; and (3) applying the proposed model to construct a large-scale home appliance fault knowledge graph to support downstream applications such as intelligent Q&A systems.