Construction of Knowledge Graph for Marine Diesel Engine Faults Based on Deep Learning Methods

Tian, Xiaohe; Gan, Huibing; Liu, Yanlin

doi:10.3390/jmse13040693

Open AccessArticle

Construction of Knowledge Graph for Marine Diesel Engine Faults Based on Deep Learning Methods

by

Xiaohe Tian

,

Huibing Gan

^*

and

Yanlin Liu

Marine Engineering College, Dalian Maritime University, Dalian 116026, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(4), 693; https://doi.org/10.3390/jmse13040693

Submission received: 5 March 2025 / Revised: 26 March 2025 / Accepted: 27 March 2025 / Published: 29 March 2025

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

As the core equipment in ship power systems, the accurate and real-time diagnosis of ship diesel engine faults directly affects navigation safety and operation efficiency. Existing methods (e.g., expert systems, traditional machine learning) can hardly cope with the complex failure modes and dynamic operation environment due to the problems of relying on artificial features and insufficient generalization ability. In this paper, we propose a BiLSTM-CRF-based knowledge graph construction method for ship diesel engine faults, aiming at integrating multi-source heterogeneous data through deep learning and knowledge graph technology, and mining the deep semantic associations among fault phenomena, causes, and solutions. The research framework covers data acquisition, ontology modeling, and knowledge extraction and storage, and the BiLSTM-CRF model is used to fuse bi-directional contextual features with label transfer probability to achieve high-precision entity recognition and relationship extraction. Finally, a scalable knowledge graph is constructed by Neo4j. Experiments show that the model significantly outperforms baseline methods such as HMM, CRF, and BiLSTM, and the graph visualization clearly presents the fault causality network, which supports knowledge reasoning and decision optimization. For example, “high exhaust temperature” can be related to potential causes such as “turbine failure” and “poor cooling”, and recommended measures can be taken. This method not only improves fault diagnosis accuracy and efficiency but also provides a novel method for intelligent ship health management.

Keywords:

marine diesel engine; fault diagnosis; knowledge graph; BiLSTM-CRF model; deep learning

1. Introduction

Marine diesel engines are the heart of ships and play a crucial role in operation. They can efficiently convert the chemical energy of fuel into mechanical energy, driving the ship’s navigation and providing a strong power supply for the ship. The diesel engine performance directly affects the ship’s speed, range, and reliability [1,2]. In ocean voyages, stable operation of the diesel engine helps to guarantee that the ship will arrive at the destination on time, and thus ensures high efficiency in cargo transportation. In addition, it can also drive the ship’s power generation equipment and the power supply for the ship’s life, communication, navigation, and other systems, therefore maintaining the normal operation of the ship and the crew’s living needs, and solidifying its position as indispensable core equipment in a ship [3].

Over the course of long-term operation of ship diesel engines, various factors, such as mechanical wear and tear, fuel quality problems, improper operation, etc., inevitably lead to the occurrence of various engine failures [4,5]. These failures will not only lead to ships stopping, delaying trips and increasing maintenance costs, but may also cause safety accidents, resulting in casualties and huge economic losses [6].

Currently, marine diesel engine fault resolution methods mainly include traditional diagnostic techniques based on expert experience, data-driven machine learning methods, and emerging deep learning techniques [7,8]. Traditional methods rely on expert experience and ship alarm systems, but with the increasing level of ship intelligence and automation, these methods can hardly meet the requirements of modern diesel engine system condition assessment. In recent years, data-driven fault diagnosis techniques based on Support Vector Machines (SVM), Back Propagation Neural Networks (BPNN), k-Nearest Neighbors (KNN), and Random Forests (RF) have gradually emerged. These methods mine potential information for fault diagnosis by collecting large-scale equipment state data [9]. However, these methods have limitations in dealing with complex systems, such as insufficient mining of nonlinear features of data and poor classification performance on unbalanced datasets. In addition, deep learning techniques such as graph convolutional networks (GCN) have also been applied in marine diesel engine fault diagnosis, but they have a high demand for data volume and computational resources, and still face the problem of insufficient model generalization ability in practical applications [10]. Although these existing methods have improved the efficiency of fault diagnosis to a certain extent, they still suffer from insufficient diagnostic accuracy, poor real-time performance, and high dependence on specialized knowledge in the face of the complex failure modes and dynamic operating environments of marine diesel engines. Therefore, there is an urgent need for a more efficient and intelligent fault diagnosis method that can integrate multi-source information to meet the actual needs of marine diesel engine fault diagnosis.

A knowledge graph leverages graph structures as a semantic network to represent and organize knowledge [11]. It constructs a body of knowledge through entities (e.g., people, places, events, etc.), relationships (e.g., “belongs to”, “is located in”, “is associated with”, etc.), and attributes (e.g., characteristics of the entity) to form a complex semantic network structure. The core function of a knowledge graph is to integrate fragmented information into structured knowledge, so as to realize the efficient storage of information, as well as associated queries and reasoning processes [12]. In the process of diagnosing faults in marine diesel engines, knowledge graphs can correlate and integrate information such as fault phenomena, maintenance experience, sensor data, and the physical structure of the diesel engine, so as to realize accurate positioning and reasoning about the cause of the fault. Not only can these graphs make up for the shortcomings of traditional methods that rely too much on specialized knowledge, but they also provide a more efficient and intelligent solution for the fault diagnosis of marine diesel engines through intelligent reasoning and correlation analysis.

Shu et al. proposed a method based on knowledge graphs to analyze ship collision accidents. Using 241 collision investigation reports issued by the China Maritime Safety Administration (CMSA) from 2018 to 2021, they constructed a Ship Collision Accident Knowledge Graph (SCAKG) and demonstrated the potential of the method in accident cause analysis and judicial process acceleration through case retrieval [13]. Gan et al. researched and developed the BERT-MCNN model to construct a knowledge graph based on deep learning for knowledge extraction and management of marine pollution regulations. The required information was extracted from Chinese and International Maritime Organization (IMO) laws and regulations related to marine pollution prevention to form a knowledge graph. The model achieved 92.4% and 92.7% accuracy in the multi-relationship extraction and named entity recognition tasks, respectively, and can effectively support the decision-making of port state surveillance and inspection (PSC) officers during on-site inspections [14]. Meng et al. introduced an approach for creating a knowledge graph of power equipment faults, leveraging the BERT-BiLSTM-CRF model. The method recognizes and extracts the equipment entities in the electric power technical literature and identifies relationships among entities through dependency syntax analysis, and finally stores the knowledge in the form of ternary groups in the Neo4j database. The model outperforms traditional methods in terms of precision in Chinese entity detection and relationship extraction, and is able to construct the knowledge graph of power equipment faults more effectively [15]. Xie et al. explored the development and application of knowledge graphs for aircraft fault diagnosis. They introduced a fault knowledge extraction method combining deep learning and heuristic rules to build a model-specific fault knowledge graph from structured and unstructured data. Additionally, they developed a Q&A system based on the fault knowledge graph, enabling precise answers to maintenance engineers’ questions and enhancing the traceability of the responses [16]. Xiao et al. proposed a knowledge graph-based semantic web approach for identifying counterfeit ship licenses. By constructing a ship knowledge graph, key features of ship violations, such as monitoring ships, expired certificates, and multiple trajectories, can be extracted and combined with inference techniques to identify violating ships. This method not only improves the effective utilization of ship data, but also enhances decision-making abilities surrounding ship safety management, which provides a new approach for intelligent maritime traffic management [17].

Current research on knowledge graph technology in the field of marine diesel engine fault diagnosis remains relatively scarce. This study proposes a knowledge graph construction method based on BiLSTM-CRF for marine diesel engine faults, aiming at integrating the multi-source heterogeneous data of marine diesel engines and mining the deep semantic associations of fault phenomena and their potential causes and corresponding solutions through knowledge graph technology. Therefore, the accuracy and real-time of fault diagnosis are optimized, and the research and practice of knowledge graph technology in this field are further promoted.

The structure of the paper is outlined below. Section 2 describes the approach to constructing knowledge graphs based on deep learning. Section 3 demonstrates the specific knowledge graph and compares it with other models, and the conclusions are summarized Section 4.

2. Methods

2.1. Research Framework

The technical framework for constructing a knowledge graph of marine diesel engine failures encompasses data acquisition, knowledge modeling, knowledge extraction, and knowledge storage, as illustrated in Figure 1. Data acquisition, the foundation of knowledge graph construction, involves collecting marine diesel engine failure data from multiple sources, including technical documents, maintenance manuals, historical failure records, etc., to ensure their comprehensiveness and accuracy, thus providing raw materials for subsequent knowledge modeling.

Knowledge modeling, which transforms unstructured data into a structured form, is also known as ontology model building. It involves defining entity types and attributes, and the relationships between them [18]. In this process, the triad serves as the basic unit of the knowledge graph, consisting of entities, relations, and entity pairs, e.g., (diesel engine, fault location, crankshaft). This structure helps convert unstructured data into the form of a query-able graph, determining the quality and usefulness of the graph. Knowledge extraction involves identifying and extracting valuable information from unstructured data, utilizing natural language processing techniques, such as entity recognition and relationship extraction, to ensure accurate extraction of fault-related knowledge points from the text [19]. Knowledge storage refers to saving the extracted and modeled knowledge in a graph database format for easy retrieval and analysis. Choosing the right storage technology is crucial to ensure the scalability and query efficiency of the knowledge graphs.

2.2. Data Acquisition

Marine diesel engine failure data are a source of knowledge. The main data sources include technical documents from patent websites, online failure cases, the literature library on marine diesel engine failures, and laboratory failure data. A total of 2000 marine diesel engine failure data were used in this study. The specific data collection methods were as follows. For patent websites, we used web crawlers to systematically collect technical documents related to marine diesel engine failures. These documents were filtered based on relevance and date of publication to ensure up-to-date and relevant information. For online fault cases, we accessed dedicated marine engineering forums and databases, where experienced engineers share their troubleshooting experiences. We also utilized academic search engines to retrieve peer-reviewed papers from the literature library, focusing on experimental results and case studies that provide insight into failure mechanisms.

Among them, the sensor timing data of the laboratory diesel engine are structured data, and the document database and fault cases obtained from the web page are unstructured data, accounting for the majority of data sources.

Fault data are stored as unstructured data. Unstructured data are those that do not conform to a fixed format or schema, and they cannot be organized and stored with a predefined data model like structured data. Unstructured data are usually free-form and has no fixed data model or schema, so it is difficult to be processed by traditional database management systems and needs to be preprocessed for knowledge extraction.

2.3. Knowledge Modeling

Extracted conceptual information is categorized to identify the entity types and relational structures within the knowledge graph. On this basis, an ontology is constructed to define entities, attributes, and relationships that establish the structure of the knowledge graph. In a knowledge graph, entities represent specific objects or abstract concepts, forming the core structure, and being interconnected through their attributes and relationships [20]. In the knowledge graph of marine diesel engine faults, entities include fault phenomena, such as destructive faults like burnt wattage and cylinder holding, fault locations, such as broken gears and low pumping capacity of water pumps, the cause of the faults, e.g., abrasive faults due to wear and tear of components, and the corresponding resolution operation for each fault. Attributes are descriptive pieces of information associated with an entity that are used to provide additional details or characteristics about the entity, a description of a particular feature, or the state of the entity. They are often used to enhance the description of an entity to make it more specific and detailed. A relationship is a semantic link connecting two entities, which describes the interrelationship or interaction between the entities.

The ontology model was constructed using Protégé software, version 5.5.0, which was developed at Stanford University [21]. The seven-step process provides a systematic framework for ontology construction [22]. It clearly organizes the whole building process, from determining the scope, considering reuse, and enumerating clauses, to defining classes, attributes, and constraints, and up to creating instances, providing a systematic framework for users to build and manage ontologies in a rational and orderly manner in the Protégé software package. This combination of tools and processes enhances the rigor and consistency of ontology development.

2.4. Knowledge Extraction

Constructing a knowledge graph for marine diesel engine faults heavily relies on knowledge extraction. The goal is to identify valuable information from unstructured text and transform it into structured data for subsequent storage and query. This process involves preprocessing the text to enhance data quality and processability, as well as applying advanced deep learning models to recognize and extract key information [23]. In this section, the steps of text preprocessing and the adopted BiLSTM-CRF model will be introduced in detail, which together support the core task of efficiently extracting knowledge from a large amount of unstructured data. The BiLSTM-CRF model is a powerful sequence annotation model combining bi-directional long- and short-term memory networks and conditional random fields, which is specially used for extracting entities from text. Entity relationships are defined in accordance with the ontology model, and the knowledge extraction task is completed by matching the extracted entities with the relationships. In these two subsections, we will show how to transform raw text into structured information in a knowledge graph.

2.4.1. Text Preprocessing

The acquired marine diesel engine fault data is unstructured text, which may contain a large amount of duplicate information and inconsistent formatting. Therefore, preprocessing is needed, including data cleaning, removing or correcting erroneous information and data, segmenting long text, and manually labeling entities.

In the present study, to ensure the reliability of the collected unstructured information, data sources were strictly screened. Technical documents published by authoritative institutions, peer-reviewed academic papers, and online fault case databases with good reputations were preferentially selected. The same or similar fault information obtained from different sources was cross-checked and verified. Experts in the field of marine diesel engines were invited to review the data collected. Experts judged the accuracy and credibility of data based on their professional knowledge and practical experience. For questionable or inconsistent data, we further traced the original source or request more detailed evidence.

Special symbols in the text should be removed, such as the time unit “h”, the frequency unit “Hz”, and the power unit “W”. At the same time, specific parameters of ships and equipment involved in fault cases and documents should also be eliminated.

In this study, we employ the BMES tagging scheme, an extension of the BIO framework that introduces labels for the middle segment (M) and single-word entities (S). Here, B indicates the beginning of an entity, M represents the middle portion, E marks the end, and S is used for standalone entities.

YEDDA is an open source text annotation tool dedicated to named entity recognition and other sequence annotation tasks. It supports BIO, BMES and other annotation modes, allowing custom labels and shortcuts to improve annotation efficiency. The interface is intuitive, easy to operate, supports cross-platform use, and can export annotation results for model training [24]. Therefore, YEDDA was used to annotate the text of marine diesel engine faults. The annotation categories are pre-divided into “Phenomenon of Failure (PHE)”, “Position of Failure (POS)”, “Reason for Failure (REA)”, and “Resolution Operation (OPE)”. The annotation interface is intuitive and easy to use in YEDDA, e.g., “black smoke” is a fault phenomenon, “B-PHE” indicates the beginning of the fault phenomenon category entity, “M-PHE” is the middle part of the malfunction phenomenon category entity, “E-PHE” is the end of the malfunction phenomenon category entity, and “O” means that the entity does not belong to any annotation category. This rule can indicate that the text of the entity labeling is complete. The labeling results for the sentence “black smoke is the ship’s diesel engine incomplete combustion characteristics” are shown in Table 1.

The dataset was divided into training, validation, and test sets at a ratio of 8:1:1. Table 2 highlights the quantities of the three entity types.

2.4.2. BiLSTM-CRF Model

As a deep learning method, the BiLSTM-CRF model is highly effective for knowledge graph construction, leveraging its advanced sequence modeling and relationship identification features [25]. The BiLSTM part encodes the text through a bidirectional long and short-term memory network, which is able to capture bidirectional dependencies in the text for better understanding of the contextual information. By employing a conditional random field, the CRF part effectively decodes sequences, capturing label transition probabilities and boosting the precision of entity recognition and relationship extraction. Compared with traditional HMM, CRF, and other models, BiLSTM-CRF automatically learns deep semantic features of text through neural networks, avoids the limitations of artificial feature engineering, and effectively solves the problem of illegal combination of label sequences by using the label transfer constraint of CRF. Additionally, although BERT has stronger semantic representation ability through large-scale pre-training, the fault text of marine diesel engines is characterized by dense domain terms and a limited scale of annotation data. BiLSTM-CRF is more lightweight and efficient compared to BERT-BiLSTM-CRF, making it more suitable for scenarios with limited computational resources and requiring faster inference speeds. The overall architecture of the BiLSTM-CRF model is illustrated in Figure 2. The input text "冷却水温度低" (cooling water temperature low) is used as an example to demonstrate how the model processes and tags each character in the sequence. The specific structure of the BiLSTM-CRF model is discussed in depth next.

BiLSTM consists of two LSTM networks that process sequence data in both forward and backward directions, allowing it to capture bidirectional dependencies and achieve a more precise understanding of contextual information. Entity recognition in a knowledge graph involves identifying and categorizing key information from text into meaningful entities, such as names of individuals or locations. BiLSTM is able to recognize the boundaries and categories of these entities by learning the feature representations in the text sequences. In addition, in the relationship extraction task, BiLSTM can analyze the semantic connections between entities and identify the types of relationships between them, such as “belonging to”, “located in”, etc. In this way, the BiLSTM model offers robust support for knowledge graph construction, which automatically extracts structured knowledge from text. Its structure is shown in Figure 3. The computational form is illustrated in Equation (1).

\begin{array}{l} i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i}) \\ f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f}) \\ {\tilde{c}}_{t} = \tanh (W_{c} [h_{t - 1}, x_{t}] + b_{c}) \\ c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t} \\ o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o}) \\ h_{t} = o_{t} ⊙ \tanh (c_{t}) \end{array}

(1)

i_{t}

is the output of the input gate;

σ

represents the sigmoid function;

f_{t}

is the output of the forgetting gate;

{\tilde{c}}_{t}

is the candidate cell state;

c_{t}

is the updated cell state;

o_{t}

is the output of the output gate;

h_{t}

is the hidden state of the current time step;

W_{i}

,

W_{f}

,

W_{c}

, and

W_{o}

are the composite weight matrixes; and

b_{i}

,

b_{f}

,

b_{c}

,

b_{o}

are the offset terms. The general form is shown in Equations (2)–(4).

s_{f t} = f (u_{1} v_{t} + u_{2} s_{f t - 1})

(2)

s_{b t} = f (u_{1} v_{t} + u_{2} s_{b t - 1})

(3)

z_{t} = g (u_{4} s_{f t} + u_{6} s_{b t})

(4)

In this formulation,

s_{f t}

and

s_{b t}

denote the forward and backward hidden states, respectively,

u_{1}

and

u_{2}

are the weight matrices,

v_{t}

is the input while

z_{t}

is the final output, and

u_{4}

and

u_{6}

are the weight matrices of the output layer.

As can be seen from the structure diagram, BiLSTM comprises a forward LSTM and a backward LSTM, with the forward LSTM processing the inputs sequentially from the beginning to the end of the sequence to capture the context information from left to right, and the backward LSTM analyzing the inputs from the end position to the start position of the sequence to capture the context information from right to left. At each time step, the forward LSTM and backward LSTM generate their respective hidden states that incorporate contextual information before and after the current moment. The hidden states generated by the forward and backward LSTMs are combined to form a feature vector that captures bidirectional context information for downstream tasks.

CRF is a statistical model for sequence annotation tasks, widely used in tasks such as named entity recognition (NER) in knowledge graph construction [26]. The core advantage of CRF is its ability to take into account the dependencies between labels, thus improving the accuracy and consistency of the annotation. In knowledge graphs, CRF is usually used to annotate text sequences and identify the entities and their categories in them. By introducing a state transfer matrix, CRF is able to utilize contextual information to annotate each element in the sequence, ensuring the global optimality of the annotation result. This capability enables CRF to excel in complex sequence annotation tasks, especially in scenarios where complex relationships between tags need to be considered. The formula used in the CRF model to calculate the score for a given input and label sequence is shown in Equation (5).

S (x, y) = \sum_{i = 1}^{n} P_{i, y_{i}} + \sum_{i = 1}^{n + 1} W_{y_{i - 1}, y_{i}}

(5)

P_{i, y_{i}}

is the firing probability at the i-th position and

W_{y_{i - 1}, y_{i}}

is the transfer probability between labels. Equation (6) represents the conditional probability of a tag sequence y given an input sequence x:

P (y | x) = \frac{e^{S (x, y)}}{\sum_{\tilde{y} \in Y_{x}} S (x, \tilde{y})}

(6)

where

e^{S (x, y)}

is the exponent of the score function and

\sum_{\tilde{y} \in Y_{x}} S (x, \tilde{y})

is the normalization factor. Equation (7) represents the label sequence

y^{*}

that maximizes the partition function given the input sequence x. This is the decoding process used to find the most probable label sequence.

y^{*} = \arg \max_{\tilde{y} \in Y_{x}} S (x, \tilde{y})

(7)

2.4.3. Evaluation Indicators

The evaluation metrics employed in this chapter primarily include precision (P), recall (R), and F1-score (F1) as the key assessment criteria. The above evaluation metrics are often used in deep learning-related models with a certain degree of credibility, and the specific computation process is shown in Equations (8)–(10).

\Pr ecision = \frac{T P}{T P + F P}

(8)

Re c a l l = \frac{T P}{T P + F N}

(9)

F 1 = \frac{2 P R}{P + R}

(10)

T P

represents the count of samples the model accurately classified as positive,

F P

indicates the number of samples mistakenly predicted as positive, and

F N

refers to the samples incorrectly classified as negative. The

F 1

-score, ranging from 0 to 1, reflects the model’s accuracy, with higher values signifying superior predictions.

2.5. Knowledge Storage

In this study, Neo4j is used for knowledge storage. Neo4j, as a popular graph database, has been widely employed in knowledge graph storage due to its powerful graph data storage and querying capabilities. It employs nodes and edges in a graph structure to represent entities and their relationships, and this structure can intuitively reflect the complex relationships between entities and facilitate the query and analysis of the graph structure. Within the Neo4j framework, each node encapsulates entity-specific attributes, serving as a structured data container. Relationships (represented as directed edges) explicitly define association types and directional dependencies between entities, establishing semantically meaningful connections that support advanced graph-based reasoning.

3. Results

3.1. Experimental Environment and Parameter Configuration

Table 3 demonstrates the relevant parameter configurations for the training environment. The Pytorch learning framework is used to build a knowledge graph of marine diesel engine faults. The batch size of 64 allows the model to learn the data features more fully during the training process without overfitting the training data, which enhances the model’s adaptability in real-world applications. The construction task usually involves complex textual data, which contains rich information such as entities, relationships, etc. Setting the epoch parameter to 30 enables the model to have enough time to learn and mine these complex data features. In the first few epochs, the model may only be able to learn some obvious and simple patterns; as the number of training iterations grows, the model progressively becomes capable of capturing more intricate and nuanced features, such as implicit associations between entities, identification of the same relationships under different text expressions, etc., so as to more accurately construct the knowledge graph. The learning rate is chosen to be 0.001 for the BiLSTM-CRF model, which requires fine-tuning of a large number of parameters to better adapt to the knowledge graph construction task. A smaller learning rate helps the model to fine-tune the parameters at the later stage of training, so that the model can more accurately recognize detailed information such as entity boundaries, entity types, and relationships between entities in the text, thus constructing a more accurate and fine-grained knowledge graph.

3.2. Validation of Results

As shown in Table 4, the BiLSTM-CRF model demonstrates superior performance. The table indicates that the HMM model performs the least effectively, with its precision, recall, and F1-score slightly below 90%, indicating that it is not as accurate and comprehensive as the other models in predicting the positive classes. The CRF model performs marginally worse compared to the BiLSTM model, but it still exceeds 90%, indicating that it is also relatively accurate and comprehensive in predicting the positive classes. The BiLSTM model’s precision and recall both exceeded 92%, and the F1-score was 91.99%, also indicating better performance. The BiLSTM-CRF model performed the best on all three metrics, with its precision rate, recall, and F1-score approaching 96.4%, indicating that the model is both accurate and comprehensive in predicting positive classes, and it achieves a balanced equilibrium between precision and recall.

Compared to the simple BiLSTM model, the introduction of CRF captures the transfer probability between labels, i.e., the probability of a label’s occurrence not only depends on the current input feature but also relates to the previous label. This modeling of label dependency allows the model to be more consistent with the actual linguistic rules and annotation habits when predicting, thus improving the accuracy of prediction. Compared to the pure CRF model, the introduction of BiLSTM provides a more comprehensive understanding of the input sequences and the long-term dependencies in the sequences and avoids the tedious process of manually designing features. As a result, the BiLSTM-CRF model combines the powerful contextual information capturing ability of BiLSTM and the modeling advantage of CRF on label dependencies to attain the optimal solution, which significantly improves the model performance and generalization ability.

Figure 4 displays the recognition results for the four categories of entities using the BiLSTM-CRF model. The figure indicates that each entity type demonstrates strong recognition performance, among which the PHE and POS categories are particularly effective, with precision, recall, and F1-score close to or over 95%. This indicates that the BiLSTM-CRF model has strong accuracy and stability in dealing with these entity types. For the REA and OPE categories, although the performance is slightly lower than that of PHE and POS, it still remains at a high level, indicating that the model has a better generalization ability on different entity types. These outcomes underscore the effectiveness and reliability of the BiLSTM-CRF model in entity recognition tasks.

3.3. Knowledge Graph Implementation

According to the ontology model, the extracted entities are matched against the defined relationships, the relationships between extracted entities are defined as follows. There is a problem between the fault symptom and the fault cause. The “caused” relation indicates that a specific fault phenomenon is caused by a certain fault cause, such as “abnormal vibration caused by bearing wear”. There is a “solution” relationship between the cause of the fault and the solution measures, indicating that a certain cause of the fault can be solved by specific solution measures, such as “bearing wear is solved by replacing the bearing”. These triples clearly show the relationship between the phenomena, causes, and solutions of marine diesel engine failures.

Figure 5 illustrates the knowledge graph of marine diesel engine faults, built using the Neo4j graph database based on the extracted triples. The three entities of fault phenomena are illustrated as circular nodes in the graph, fault causes and resolution operations, while the edges between the entities represent the relationships between these entities. Specifically, the fault phenomenon node is connected to the fault cause node, and the fault cause node is connected to the resolution operation node, forming a clear causal network.

Through the graph, we can visualize what each failure phenomenon may be caused by, and which resolution operations can be taken for each cause. Figure 6 shows that the fault phenomenon “Excessive exhaust temperature” may result from turbine or compressor failure, poor coolant circulation, piston ring wear, etc. These issues can be addressed by implementing resolution operations such as checking the cooling system or inspecting piston rings. This structured representation not only helps to quickly identify and diagnose faults but also provides guidance for fault prevention.

In addition, knowledge graphs support knowledge reasoning and decision support. Through the graph, potential connections between different fault phenomena and causes can be found, thus providing more clues for fault diagnosis. At the same time, the graph can also help users to find the most appropriate solution through reasoning when facing complex faults, improving the efficiency and accuracy of fault handling.

4. Conclusions

The main contribution of this study is to establish a comprehensive knowledge graph framework for marine diesel engine faults and overcome the challenge of data heterogeneity through the proposed multi-source fusion strategy. The BiLSTM-CRF hybrid model sets a new benchmark in technical document understanding, with improved accuracy compared to traditional knowledge extraction methods. The implemented inference engine demonstrates unique value in bridging the symptom–cause–solution triad, providing actionable insights beyond traditional diagnostic systems. These advances not only improve fault diagnosis capabilities but also lay the foundation for an intelligent maintenance decision system for the maritime industry.

During the research process, we constructed a complete technical framework, covering key aspects such as data acquisition, knowledge modeling, knowledge extraction, and knowledge storage. The data acquisition session collected a large amount of unstructured text data related to ship diesel engine faults, which provided abundant raw materials for subsequent modeling and extraction. In the knowledge modeling stage, the ontology model was constructed by Protégé software, which clarified the entity types, attributes and relationships, and laid the groundwork for building the knowledge graph. In the knowledge extraction session, the BiLSTM-CRF model plays an important role, which combines the powerful context capturing capability of BiLSTM and the modeling advantage of CRF on label dependencies, effectively enhancing the precision of entity recognition and relationship extraction. Ultimately, the Neo4j graph database is employed to achieve the storage and efficient retrieval of knowledge, and a structured knowledge graph of marine diesel engine faults is constructed.

The BiLSTM-CRF model outperforms other traditional models in terms of precision, recall, and F1-score, as demonstrated by the experimental results. In addition, the visual representation of the knowledge graph clearly presents the causal network between fault phenomena, causes, and resolution operations, providing intuitive guidance for fault diagnosis and prevention. Through knowledge reasoning, potential associations can also be identified to further optimize ship health management.

In summary, the knowledge graph construction method for marine diesel engine faults based on BiLSTM-CRF proposed in this paper not only effectively integrates multi-source information, but also notably improves the efficiency and accuracy of fault diagnosis, thereby offering a robust basis for the intelligent management and maintenance of marine diesel engines, with both theoretical importance and practical relevance. In the future, we will further optimize the model, expand the data sources, and investigate the role of knowledge graphs in enhancing fault diagnosis for marine equipment, driving the marine industry towards greater intelligence.

Author Contributions

Conceptualization, X.T.; methodology, X.T.; software, X.T.; validation, X.T., Y.L. and H.G.; formal analysis, H.G. and Y.L.; investigation, X.T.; resources, H.G.; data curation, H.G.; writing—original draft preparation, X.T.; writing—review and editing, H.G.; visualization, H.G.; supervision, H.G.; project administration, H.G.; funding acquisition, H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key R&D Program of China (grant number 2022YFB4301400), and the Hanhai Engineering Education and Teaching Project of Dalian Maritime University.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Seyam, S.; Dincer, I.; Agelin-Chaab, M. Optimization and comparative evaluation of novel marine engines integrated with fuel cells using sustainable fuel choices. Energy 2024, 301, 131629. [Google Scholar] [CrossRef]
Polemis, D.; Boviatsis, M.; Chatzinikolaou, S. Assessing the Sustainability of the Most Prominent Type of Marine Diesel Engines under the Implementation of the EEXI and CII Regulations. Clean Technol. 2023, 5, 1044–1066. [Google Scholar] [CrossRef]
Liu, L.; Wang, X.; Gui, Y.; Wang, Y. A new track for high-efficiency marine diesel engines: Initiative air jet and post-split injection combined. Energy 2024, 310, 133114. [Google Scholar] [CrossRef]
Chybowski, L.; Myskow, J.; Kowalak, P. Analysis of fuel properties in the context of the causes of three marine auxiliary engines failure—A case study. Eng. Fail. Anal. 2023, 150, 107362. [Google Scholar] [CrossRef]
Antonić, R.; Vukić, Z.; Kuljača, O. Marine Diesel Engine Faults Diagnosis Based on Observed Symptoms and Expert Knowledge. IFAC Proc. Vol. 2003, 36, 133–138. [Google Scholar] [CrossRef]
Shi, Q.; Hu, Y.; Gao, F. Prioritization of key practices for marine diesel engine maintenance activities using 2-tuple linguistic term set and DEMATEL. Ocean Eng. 2023, 286, 115644. [Google Scholar] [CrossRef]
Lv, Y.; Yang, X.; Li, Y.; Liu, J.; Li, S. Fault detection and diagnosis of marine diesel engines: A systematic review. Ocean Eng. 2024, 294, 116798. [Google Scholar] [CrossRef]
Youssef, A.; Noura, H.; El Amrani, A.; El Adel, E.; Ouladsine, M. A Survey on Data-Driven Fault Diagnostic Techniques for Marine Diesel Engines. In IFAC-PapersOnLine, Proceedings of the 12th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes, Ferrara, Italy, 4–7 June 2024; Elsevier: Amsterdam, The Netherlands, 2024; Volume 58, pp. 55–60. [Google Scholar] [CrossRef]
Wang, Z.; Qian, L.; Han, C.; Shi, L. Application of multi-feature fusion and random forests to the automated detection of myocardial infarction. Cogn. Syst. Res. 2020, 59, 15–26. [Google Scholar] [CrossRef]
Xie, L.; Pi, D.; Zhang, X.; Chen, J.; Luo, Y.; Yu, W. Graph neural network approach for anomaly detection. Measurement 2021, 180, 109546. [Google Scholar] [CrossRef]
Chen, X.; Jia, S.; Xiang, Y. A review: Knowledge reasoning over knowledge graph. Expert Syst. Appl. 2020, 141, 112948. [Google Scholar] [CrossRef]
Shen, T.; Zhang, F.; Cheng, J. A comprehensive overview of knowledge graph completion. Knowl.-Based Syst. 2022, 255, 109597. [Google Scholar] [CrossRef]
Gan, L.; Ye, B.; Huang, Z.; Xu, Y.; Chen, Q.; Shu, Y. Knowledge graph construction based on ship collision accident reports to improve maritime traffic safety. Ocean Coast. Manag. 2023, 240, 106660. [Google Scholar] [CrossRef]
Liu, C.; Zhang, X.; Xu, Y.; Xiang, B.; Gan, L.; Shu, Y. Knowledge graph for maritime pollution regulations based on deep learning methods. Ocean Coast. Manag. 2023, 242, 106679. [Google Scholar] [CrossRef]
Meng, F.; Yang, S.; Wang, J.; Xia, L.; Liu, H. Creating Knowledge Graph of Electric Power Equipment Faults Based on BERT–BiLSTM–CRF Model. J. Electr. Eng. Technol. 2022, 17, 2507–2516. [Google Scholar] [CrossRef]
Tang, X.; Chi, G.; Cui, L.; Ip, A.W.H.; Yung, K.L.; Xie, X. Exploring Research on the Construction and Application of Knowledge Graphs for Aircraft Fault Diagnosis. Sensors 2023, 23, 5295. [Google Scholar] [CrossRef] [PubMed]
Wan, H.; Fu, S.; Zhang, M.; Xiao, Y. A Semantic Network Method for the Identification of Ship’s Illegal Behaviors Using Knowledge Graphs: A Case Study on Fake Ship License Plates. J. Mar. Sci. Eng. 2023, 11, 1906. [Google Scholar] [CrossRef]
Dutta, B.; Madalli, D.P. Trends in knowledge modelling and knowledge management: An editorial. J. Knowl. Manag. 2015, 19, 1–5. [Google Scholar] [CrossRef]
Goyal, N.; Singh, N. Named entity recognition and relationship extraction for biomedical text: A comprehensive survey, recent advancements, and future research directions. Neurocomputing 2025, 618, 129171. [Google Scholar] [CrossRef]
Liu, R.; Guo, X.; Zhu, H.; Wang, L. A text-speech multimodal Chinese named entity recognition model for crop diseases and pests. Sci. Rep. 2025, 15, 5429. [Google Scholar] [CrossRef]
Musen, M.A.; Protege, T. The Protege Project: A Look Back and a Look Forward. AI Matters 2015, 1, 4–12. [Google Scholar] [CrossRef]
Noy, N.F.; Mcguinness, D.L.J.; Informatics, S.M. Ontology Development 101: A Guide to Creating Your First Ontology; Knowledge Systems Laboratory, Stanford University: Stanford, CA, USA, 2001. [Google Scholar]
Wu, Y.; Wan, J. A survey of text classification based on pre-trained language model. Neurocomputing 2025, 616, 128921. [Google Scholar] [CrossRef]
Yang, J.; Zhang, Y.; Li, L.; Li, X. YEDDA: A Lightweight Collaborative Text Span Annotation Tool. In System Demonstrations, Proceedings of ACL 2018, Melbourne, Australia, 15–20 July 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 31–36. [Google Scholar] [CrossRef]
Zhu, Y. A knowledge graph and BiLSTM-CRF-enabled intelligent adaptive learning model and its potential application. Alex. Eng. J. 2024, 91, 305–320. [Google Scholar] [CrossRef]
Leevy, J.L.; Khoshgoftaar, T.M.; Villanustre, F. Survey on RNN and CRF models for de-identification of medical free text. J. Big Data 2020, 7, 73. [Google Scholar] [CrossRef]

Figure 1. Technological process.

Figure 2. Architecture of BiLSTM-CRF.

Figure 3. BiLSTM structure.

Figure 4. Experimental results for different entities.

Figure 5. Knowledge graph of marine diesel engine failures (partial).

Figure 6. Excessive exhaust temperature.

Table 1. Example sentence labeling results.

English Translation	Fault Text	Entity Labeling
discharge black smoke	冒	B-PHE
	黑	M-PHE
	烟	E-PHE
is	是	O
marine diesel engines	船	B-POS
	舶	M-POS
	柴	M-POS
	油	M-POS
	机	E-POS
incomplete combustion	不	B-REA
	完	M-REA
	全	M-REA
	燃	M-REA
	烧	E-REA
	的	O
characteristic	特	O
	征	O

Table 2. Distribution of entity types.

Entity Type	Quantities
POS	744
PHE	202
REA	937
OPE	173

Table 3. Training environment.

Parameter Type	Detailed Configuration
CPU	Intel(R) Core (TM) i7-9700 CPU @ 3.00 GHz, Intel Corporation, Santa Clara, CA, USA
GPU	NVIDIA GeForce GTX 1660 Ti, NVIDIA Corporation, Santa Clara, CA, USA
Python	3.6.13
Pytorch	1.7.0
Batch size	64
Epoch	30
Learning rate	0.001

Table 4. The results of different models.

Model	Precision (%)	Recall (%)	F1-Score (%)
HMM	89.43	89.24	89.31
CRF	90.40	90.41	90.40
BiLSTM	92.38	92.20	91.99
BiLSTM-CRF	96.38	96.39	96.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tian, X.; Gan, H.; Liu, Y. Construction of Knowledge Graph for Marine Diesel Engine Faults Based on Deep Learning Methods. J. Mar. Sci. Eng. 2025, 13, 693. https://doi.org/10.3390/jmse13040693

AMA Style

Tian X, Gan H, Liu Y. Construction of Knowledge Graph for Marine Diesel Engine Faults Based on Deep Learning Methods. Journal of Marine Science and Engineering. 2025; 13(4):693. https://doi.org/10.3390/jmse13040693

Chicago/Turabian Style

Tian, Xiaohe, Huibing Gan, and Yanlin Liu. 2025. "Construction of Knowledge Graph for Marine Diesel Engine Faults Based on Deep Learning Methods" Journal of Marine Science and Engineering 13, no. 4: 693. https://doi.org/10.3390/jmse13040693

APA Style

Tian, X., Gan, H., & Liu, Y. (2025). Construction of Knowledge Graph for Marine Diesel Engine Faults Based on Deep Learning Methods. Journal of Marine Science and Engineering, 13(4), 693. https://doi.org/10.3390/jmse13040693

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Construction of Knowledge Graph for Marine Diesel Engine Faults Based on Deep Learning Methods

Abstract

1. Introduction

2. Methods

2.1. Research Framework

2.2. Data Acquisition

2.3. Knowledge Modeling

2.4. Knowledge Extraction

2.4.1. Text Preprocessing

2.4.2. BiLSTM-CRF Model

2.4.3. Evaluation Indicators

2.5. Knowledge Storage

3. Results

3.1. Experimental Environment and Parameter Configuration

3.2. Validation of Results

3.3. Knowledge Graph Implementation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI