Joint Entity–Relation Extraction for Knowledge Graph Construction in Marine Ranching Equipment

Chen, Du; Gao, Zhiwu; Li, Sirui; Guo, Xuruixue; Wu, Yaqi; Zhang, Haiyu; Zhang, Delin

doi:10.3390/app15137611

Open AccessArticle

Joint Entity–Relation Extraction for Knowledge Graph Construction in Marine Ranching Equipment

by

Du Chen

¹,

Zhiwu Gao

¹,

Sirui Li

¹,

Xuruixue Guo

¹,

Yaqi Wu

¹,

Haiyu Zhang

^1,*

and

Delin Zhang

^2,*

¹

Yantai Institute, China Agricultural University, Yantai 264670, China

²

College of Water Resources and Civil Engineering, China Agricultural University, Beijing 100083, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7611; https://doi.org/10.3390/app15137611

Submission received: 24 March 2025 / Revised: 1 May 2025 / Accepted: 26 May 2025 / Published: 7 July 2025

(This article belongs to the Section Marine Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

The construction of marine ranching is a crucial component of China’s Blue Granary strategy, yet the fragmented knowledge system in marine ranching equipment impedes intelligent management and operational efficiency. This study proposes the first knowledge graph (KG) framework tailored for marine ranching equipment, integrating hybrid ontology design, joint entity–relation extraction, and graph-based knowledge storage: (1) The limitations in existing KG are obtained through targeted questionnaires for diverse users and employees; (2) A domain ontology was constructed through a combination of the top-down and the bottom-up approach, defining seven key concepts and eight semantic relationships; (3) Semi-structured data from enterprises and standards, combined with unstructured data from the literature were systematically collected, cleaned via Scrapy and regular expression, and standardized into JSON format, forming a domain-specific corpus of 1456 annotated sentences; (4) A novel BERT-BiGRU-CRF model was developed, leveraging contextual embeddings from BERT, parameter-efficient sequence modeling via BiGRU (Bidirectional Gated Recurrent Unit), and label dependency optimization using CRF (Conditional Random Field). The TE + SE + R_i + BMESO tagging strategy was introduced to address multi-relation extraction challenges by linking theme entities to secondary entities; (5) The Neo4j-based KG encapsulated 2153 nodes and 3872 edges, enabling scalable visualization and dynamic updates. Experimental results demonstrated superior performance over BiLSTM-CRF and BERT-BiLSTM-CRF, achieving 86.58% precision, 77.82% recall, and 81.97% F1 score. This study not only proposes the first structured KG framework for marine ranching equipment but also offers a transferable methodology for vertical domain knowledge extraction.

Keywords:

marine ranching equipment; knowledge graph; hybrid ontology; BERT-BiGRU-CRF; joint entity–relation extraction

1. Introduction

Marine ranching, a cornerstone of China’s Blue Granary strategy, has emerged as a transformative approach to modernize marine fisheries, enhance aquaculture productivity, and promote ecological sustainability [1]. The rapid development of related equipment, such as intelligent feeding systems, deep-sea cages, and multi-functional platforms, has significantly improved operational efficiency [2,3]. However, the exponential growth of domain-specific knowledge remains fragmented, with critical information dispersed across heterogeneous sources including enterprise records, experts’ experiences, academic literature, and technical standards. This fragmentation impedes intelligent decision-making, real-time monitoring, and knowledge sharing, thereby limiting the full potential of marine ranching industrialization.

A knowledge graph (KG) is a state-of-the-art semantic network paradigm; it employs graph structures to visualize relationships between entities, demonstrating advantages in intuitiveness, efficiency, and scalability [4]. The concept of KG was first proposed by Google in 2012 and applied in the search engine domain [5]. Nowadays, knowledge graphs are classified into general knowledge graphs and vertical knowledge graphs. General knowledge graphs do not target specific domains and have relatively low requirements for the accuracy of knowledge. They emphasize the breadth of knowledge and have a wide coverage. Examples include DBpedia [6], Yago [7], Freebase [8], and Wikidata [9]. Vertical knowledge graphs, on the other hand, are oriented towards a specific domain and emphasize the depth of knowledge. They have higher requirements for the professionalism and accuracy of knowledge. Examples include IMDB (Internet Movie Database) [10], MusicBrainz [11], and Chinese medical knowledge graphs. Recent advancements in vertical knowledge graph applications demonstrate their versatility in various industries. For example, in manufacturing, Ren et al. [12] automated OPC UA (OLE for Process Control Unified Architecture) information modeling via KG to unify heterogeneous equipment data, while Gu et al. [13] integrated geometric and assembly process data through a KG-based semantic model (KG-ASM). For agriculture, Wang et al. [14] constructed a knowledge graph of agricultural engineering technology based on large language model. In the other marine domain, Chen et al. [15] provide an overview of China’s policies on the development of marine ranching over the past two decades. Their study clarifies the current status, research hotspots, and future directions of marine ranching research. Additionally, Liu et al. [16] established a knowledge graph construction and application framework for maritime accidents to facilitate the extraction and management of maritime knowledge from unstructured texts.

Despite these advancements, existing research predominantly focuses on structured data from product design or assembly processes, neglecting the unique challenges of marine equipment domains where unstructured text dominates and entities exhibit complex interdependencies. Traditional extraction methods suffer from error propagation and inefficiency in handling such scenarios, while deep learning models like BERT-BiLSTM-CRF face limitations in parameter efficiency and contextual dependency modeling. Compared with the aforementioned research, this article focuses on the study of joint extraction methods for knowledge in the domain of marine ranching equipment.

By designing targeted questionnaires for diverse users and employees, several limitations in existing vertical KG have been identified as follows: (1) Limited data volume and limited knowledge scope. (2) Ambiguity in the structure of the knowledge framework. (3) Low efficiency and accuracy in knowledge extraction. (4) Difficult updates and maintenance. Consequently, there is an urgent need for a specialized KG framework tailored to marine ranching equipment.

This study proposes the first structured KG framework for marine ranching equipment to bridge the gap between unstructured marine equipment data and knowledge. The main contributions of this study include:

(1): Hybrid Ontology Design: A combination of the top-down and the bottom-up approach constructs a domain ontology, defining seven core concepts and eight semantic relationships.
(2): Joint Extraction Model: A BERT-BiGRU-CRF model that integrates BERT’s contextual embeddings, BiGRU’s parameter-efficient sequence modeling, and CRF’s global label optimization was developed. A novel TE + SE + R_i + BMESO tagging strategy resolves multi-relation extraction challenges.
(3): Dynamic Knowledge Storage: The extracted triples are stored in Neo4j, enabling scalable visualization and real-time updates via Cypher queries.

This work offers a transferable solution for vertical domains. By transforming fragmented data into structured knowledge, our framework supports intelligent applications including equipment fault diagnosis, maintenance planning, and policy formulation.

The remainder of this paper is structured as follows: Section 2 details the hybrid KG construction methodology. Section 3 describes the tagging strategy and BERT-BiGRU-CRF model. Section 4 evaluates experimental results, and Section 5 concludes with future directions.

2. Hybrid KG Construction Methodology

KG is basically a special semantic network composed of nodes and edges [17], which can connect different kinds of information together to form a relational network based on the connections between things.

The construction of KG can be divided into three types: top down, bottom up, and the combination of the two. In the top-down approach, the ontology concept layer (i.e., pattern layer) is constructed from the top down to determine the edge of knowledge extraction, and then the entity is added to the knowledge base through the graph construction technology, such as knowledge extraction. In the bottom-up approach, entities, relationships, and attributes with a high confidence coefficient are extracted from data sources and added to the knowledge base. Then, concepts are abstracted from the bottom up to complete the construction of the pattern layer. The method of combining the two is to build the pattern layer from the top down and then to build the data layer from the bottom up. Through the induction and summary of newly acquired data, the entity expansion is realized based on the updated pattern layer [4].

The KG of marine ranching equipment is vertical KG, which generally adopts the top-down construction approach. However, with the increase in the amount of data, the difficulty of updating and maintaining the graph will become more and more prominent. Therefore, this study adopts a combination of the two methods to construct the KG of marine ranching equipment, which can not only support large data quantity but also ensure the high quality of knowledge.

The process includes data acquisition and preprocessing, knowledge modeling, knowledge extraction, and data storage, which is shown in Figure 1.

2.1. Data Acquisition and Preprocessing

The primary data sources for constructing the KG include related websites, enterprise production records, expert interview, local and national standards, as well as relevant literature and publications. These data sources are categorized into semi-structured data and unstructured data. To ensure both the quality and quantity of acquired knowledge, distinct acquisition and preprocessing methods are implemented for different data types.

For semi-structured data from websites, the most popular data acquisition technique is web crawling. After comprehensively evaluating popular web crawling frameworks, such as Scrapy (v2.12.0), PySpider, Crawley, and Portia, Scrapy was selected due to its advantages in stability, speed, scalability, modular structure, and low inter-module coupling [18]. However, raw HTML documents often contain irrelevant content and redundant information, which may compromise the quality and efficiency of subsequent knowledge extraction. Therefore, preprocessing is essential to perform data cleaning and format standardization, ensuring the reliability of knowledge sources for graph construction. The specific workflow is illustrated in Figure 2.

Step 1: Collect and analyze the related websites, and specify the crawling target.

Step 2: According to the website structure, write corresponding scripts to obtain raw HTML format documents.

Step 3: Use regular expressions to clean the HTML document, remove the advertising, labels, etc.

Step 4: Write a format conversion script and combine with certain manual review (such as clearing spaces, duplicate content, etc.) to sort out the JSON text research document in {“key”: “value”, “key”: [value]} format.

Unstructured data, such as enterprise data, expert interviews, relevant literature, and books, can be divided into electronic text data and paper text data. A text parsing method is used to obtain the electronic text data. The paper text data are obtained by OCR text recognition. In order to facilitate the unified processing of subsequent data, combined with manual audit, the obtained data are cleaned, and the format is converted to obtain the JSON file in the same format as above.

The processed data do not contain any irrelevant content and are subject to certain rules. The “value” containing one value is stored as a string, and the “value” containing multiple values is stored as an array. Each data point represents a type of marine ranching and its associated attributes and attribute values.

2.2. Knowledge Modeling

Knowledge modeling is not only the foundation and preparation work for the construction of KG, but also the premise for the complete construction of a valuable KG. It can effectively organize and utilize useful knowledge in massive information to build a unified knowledge model that is convenient for computer processing [19]. Ontology is a modeling tool that describes domain concepts, which can ensure that the graph has good structure and redundancy. Therefore, this study adopts the ontology-based modeling method to build the pattern layer of the graph.

The graph in this study is a vertical KG. It has higher professional knowledge and accuracy requirements; therefore, it adopts the method of top-down, manual building of a construct ontology. At present, common manual ontology construction methods include the seven-step method [20], the skeleton method [21], the METHONTOLOGY method [22], etc. The seven-step method is the most widely used method at present. It is an iterative ontology modeling method, which is mainly used for the construction of a domain ontology. The advantages are that it has a detailed step-by-step description and strong operability. The steps are as follows: Determine the domain of the ontology; consider whether existing ontologies can be reused; list ontology key items; determine the types and structure of types; determine the attributes of the types; identify the characteristics of the attribute; create an instance. For ontology design, there is no absolutely correct domain ontology construction method, but the most suitable method for a certain application scenario. By referring to ontology construction methods in other domains, this study, combined with the application scenario of the marine ranching equipment domain, optimized the existing seven-step method and finally obtained the construction process of the marine ranching equipment domain ontology, as shown in Figure 3.

2.2.1. Determine the Domain of the Ontology

Protégé [23] is an open-source ontology editing tool developed by Stanford University Biomedical Information Research Center based on the Java programming language. When using Protégé to build an ontology, the primary task is to determine the domain of the ontology; that is, we should clarify what the domain covered by the ontology is, what its purpose is, what scenarios it will be applied to, and how to maintain it. In this study, the ontology covers the domain of marine ranching equipment, which is mainly used for the construction of KG. The data in the ontology will be used for intelligent retrieval and question answering. The main maintenance method is to update classes, relationships, and attributes based on the induction and summary of new data. In addition, listing key concepts and terms in the domain gives users and builders a clearer understanding of the entire ontology database. Some of the key concepts and terms are shown in Table 1.

2.2.2. Determine the Structure and Related Elements

It is necessary to consider and design the concept, attributes, and relations of the marine ranching equipment ontology comprehensively. According to domain investigation and relevant papers, marine ranching equipment can be divided into the following five equipment modules: multi-functional platforms, deep-sea cages, marine ranching observatory, sea fishing boats, and engineering vehicles [1]. Based on the five equipment modules and the existing data content and characteristics, the remaining four parent concepts are determined: Various types of equipment, marine design criteria, positioning methodology and principal dimensions, among which all kinds of equipment include seven sub-concepts such as security equipment, energy equipment, aquaculture equipment and navigation equipment. In order to further increase the amount of graph data and adapt to the current situation that marine ranching tends to be intelligent, this study further enriches the ontology database and adds two parent concepts, marine ranching equipment knowledge and marine ranching construction. The concept of marine ranching equipment knowledge mainly includes the knowledge of existing demonstration areas of marine ranching and the introduction of various types of marine ranching. The concept of marine ranching construction refers to local standards, including monitoring and evaluation, layout, and distribution of construction norms. Figure 4 shows the structure.

Since each concept has distinct features and associated data, its attributes must be defined accordingly. Table 2 lists some attributes of the marine ranching equipment ontology.

In the above ontology structure, in addition to the upper and lower relations between concepts, there are also certain semantic relations among entities contained in sub-concepts. For example, there is a use_condition relationship between the multi-functional platform (“Geng Hai No. 1”) and marine design criteria (“designed water depth (10 m)”), and there is aquaculture relationship between the deep-sea cages (“Jing Hai No. 1”) and the aquaculture equipment (“Automatic bait feeder (1 set)”) and the category of relationships is the same as the range. Table 3 shows the semantic relationships of the marine ranching equipment body based on the concepts designed in the previous section.

2.2.3. Marine Ranching Equipment Ontology Construction

Finally, the ontology modeling tool Protégé was used to build the ontology of marine ranching equipment. The above defined concepts, related elements and some examples were added to complete the knowledge modeling of the graph, as shown in Figure 5.

2.3. Knowledge Extraction

Knowledge extraction aims to extract entity–attribute–attribute value triples from different types of acquired data, so as to provide necessary knowledge for the construction of the KG. It is divided into three categories: entity, relation, and attribute extraction. Entity extraction is the most basic and key step in knowledge extraction. Deep learning-based methods are currently the most popular in the domain of entity extraction. Common deep learning models include BiGRU-CRF [24], BiLSTM-CRF [25]. In this study, it needs to identify marine ranching equipment entities in the marine ranching equipment text, such as “Geng Hai No. 1” (marine ranching equipment module instance), “Equipment-based Marine Ranching type” (marine ranching type instance), etc. Relation extraction is the process of extracting the relationship between entities on the basis of entity extraction. Attribute extraction refers to the extraction of entity attribute information, such as “Equipment-based Marine Ranching type” and “Geng Hai No. 1”, and generally regards attributes as the relationship between entities and attribute values for extraction [26].

The extracted objects include semi-structured data and unstructured data. Extraction methods for different types of data are not the same. For semi-structured data, the method of rule-based scripting is used to complete the joint extraction of entity attributes. For unstructured data, the method based on deep learning model is used to complete the joint extraction of entity relations. The specific extraction model is described in Section 3.

2.4. Knowledge Storage

Knowledge storage should consider the application scenario and data scale, and choose the appropriate storage mode to store the structured knowledge in the database, which can realize the efficient management and analysis of data. At present, knowledge storage can be divided into two kinds according to the storage structure: table-based knowledge storage and graph-based knowledge storage. After a comprehensive analysis, this study adopts the graph-based knowledge storage.

Presently, the predominant graph database systems encompass HyperGraphDB, OrientDB, and Neo4j. Neo4j [27] is the most popular among them, which can store and query entities, attributes, and relationships in KG, and can also support applications to operate and analyze the KG, which is classified as a property graph within the realm of graph databases. Additionally, it is fundamentally structured around four core components: labels, nodes, relations, and attributes. The effects and description object represented by each constituent are delineated in Table 4. In contrast to alternative graph database systems, Neo4j boasts superior scalability, the capacity to accommodate millions of nodes on standard hardware configurations, the availability of the Cypher query language, and compatibility with a multitude of popular programming languages. Consequently, Neo4j has been chosen as the database platform for the storage and maintenance of the graph structure in the present study.

This study is based on the py2neo library in Python (v3.9.13), and through script writing, it facilitates the batch import of triples, such as (entity, relationship, entity) and (entity, attribute, attribute value). The Neo4j-based KG encapsulated 2153 nodes and 3872 edges. Figure 6 shows partial content.

In Figure 6, the nodes are distinguished by varying colors corresponding to disparate conceptual instances, while the varying edges interlinking these nodes signify the relational aspects. Owing to the openness of the KG and the commendable scalability of the Neo4j database, it is anticipated that the KG established in this research can be systematically enriched and augmented via Cypher query language statements. This will, in turn, provide a robust foundation for subsequent applications in equipment fault diagnosis, maintenance planning, and policy formulation.

3. Joint Extraction of Knowledge in the Domain of Marine Ranching Equipment

Knowledge extraction aims to extract entity–attribute–attribute value triples from different types of data obtained, so as to provide necessary knowledge for the construction of KG. As mentioned above, for semi-structured data, rule-based joint extraction is used to complete the joint extraction, for unstructured data, the method based on deep learning model is used to complete the joint extraction.

3.1. Rule-Based Joint Extraction of Entity Attributes

The dataset analyzed in the preceding section is characterized as semi-structured data, adhering to specific rules. For certain data entries, the initial key–value pair enclosed within each set of curly braces denotes the category to which the entity pertains, as well as the entity’s name. Subsequent key–value pairs are archived in the format “attribute”: “attribute value”, with each pair pertaining to the entity identified in the initial key–value pair. Empirical validation has confirmed that this structured rule facilitates the extraction of entity–attribute–attribute value triples (e.g., marine ranching equipment entity–attribute 1–attribute value 1; marine ranching equipment entity–attribute 2–attribute value 2; …; marine ranching equipment entity–attribute n–attribute value n).

In order to enhance the presentation and utility of the KG, the current study introduces a method to normalize multi-valued attributes into entities. A segment of the KG is illustrated in Figure 7.

3.2. Joint Entity Relation Extraction Based on Deep Learning Models

3.2.1. The Innovative TE + SE + R_i + BMESO Tagging Strategy

Upon examining the “JSON” dataset that was previously processed, it has been ascertained that the “value” domain comprises a substantial portion of unstructured textual data, which concurrently harbors numerous cryptic interconnections among entities. For example, within the “value” segment of the “introduction” (profile) attribute for “Jing Hai No. 1”, there are intricate entity relationships pertaining to principal dimensions, aquaculture equipment, and marine design criteria.

Drawing upon the comprehensive analysis of the marine ranching equipment corpus, in conjunction with the interrelations delineated within the model layer, several distinctive attributes have been elucidated: (1) The extraction tasks for this iteration are unanimously centered around the conceptual entity of the marine ranching equipment module, thereby designating the marine ranching equipment module entity as the theme entity within the extracted triples; (2) The relations between the marine ranching equipment module entity and other entities, as well as the categorization of these other entities, remain consistent. The identification of the entity types facilitates the determination of their respective relationships; (3) In a sentence, it is feasible to encounter multiple relationships between the marine ranching equipment module entity and other diverse entities.

Drawing upon the preceding analytical insights, the current study introduces an innovative tagging strategy, designated as TE + SE + R_i + BMESO, which is specifically tailored for the marine ranching equipment corpus. The study employs the BERT-BiGRU-CRF entity extraction model to concurrently identify and extract inter-entity relationships. Within the context of this tagging schema, the marine ranching equipment module entity is denoted as the theme entity, represented by TE. Entities that interact with the marine ranching equipment module entity are denoted by SE_ R_i, with SE denoting the secondary entity and R_i indicating the category of the i-th secondary entity SE_i, which corresponds to the relationship type linking the theme entity to SE_i. The BMESO sequence labeling approach is utilized, with the detailed connotations of each label delineated in Table 5.

For example, in the input sequence “Jing Hai Yi Hao Pei Bei Xi Wang Ji”, in this sentence, “Jing” serves as the beginning of the theme entity, corresponding to Tag “B-TE”, “Hai Yi” is in the middle of the theme entity, corresponding to Tag “M-TE”, “Hao” is at the end of the theme entity, corresponding to Tag “E-TE”, “Pei Bei” serves a connecting role and has no specific meaning, so it corresponds to Tag “O”. “Xi”, “Wang”, “Ji”, respectively, correspond to the beginning, middle and ending of the secondary entity. Similarly, they can be marked as “B-SE_Ri, M-SE_Ri, E-SE_Ri”.

3.2.2. The Specific Structure and Working Principle of the BERT-BiGRU-CRF Model

The BERT model [18] is a widely adopted pre-trained language model in natural language processing (NLP) in recent years, demonstrating exceptional performance in text representation and semantic understanding. The BERT-BiGRU-CRF architecture comprises three layers: a BERT layer, a Bidirectional Gated Recurrent Unit (BiGRU) layer, and a Conditional Random Field (CRF) layer through their integrated BERT-BiGRU-CRF approach. The overall model is shown in Figure 8.

Compared with the traditional model, BiLSTM-CRF, this study adopts BiGRU to reduce the parameter quantity (by 18%) and explicitly models the label dependency between ‘B-TE’ and ‘E-SE_Ri’ through the CRF layer to solve the problem of nested entities of marine equipment.

(1): BERT layer

BERT is a context-based word embedding model and its structure is delineated in Figure 9.

The execution of the BERT layer predominantly encompasses two pivotal components: the representation of input data and pre-training procedures. The representation of input data pertains to the transformation of the data into a format compatible with BERT’s input requirements. Each character within the input data is the aggregate of token embeddings, segment embeddings, and position embeddings, as shown in Figure 10.

BERT is pre-trained based on two major tasks: “Masked Language Model” (MLM) and “Next Sentence Prediction” (NSP). Through the simultaneous training of these two tasks, it can better extract word-level and sentence-level features of the text, obtaining token embeddings that contain more semantic information.

(2): BIGRU layer

To concurrently acquire contextual insights, the current study has developed a Bidirectional Gated Recurrent Unit (BiGRU) network, where the fundamental building block consists of both forward and reversed GRU elements. The detailed model is delineated in Figure 11.

In Figure 11, the variable x_t denotes the input data at the current instance, whereas h_t signifies the output at the current instance, and h_t₋₁ denotes the output at the previous instance. Concurrently, the reset gate r_t and the update gate z_t operate synergistically to regulate the previous hidden state h_t₋₁ and to facilitate its transition into the new hidden state h_t. The reset gate combines h_t₋₁ and x_t, producing a matrix r_t, the elements of which range from 0 to 1. The specific formula is as follows:

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r})

(1)

The update gate combines h_t₋₁ and x_t to control how much information from the previous step’s output h_t₋₁ is retained. The specific formula is as follows.

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z})

(2)

The candidate memory

\bar{h t}

consists of two components: one part is the current input data x_t, and the other part is the output h_t₋₁ from the previous moment, determined by the reset gate r_t. The specific formula is as follows:

{\bar{h}}_{t} = \tan h (W_{h} \cdot [r_{t} {* h}_{t - 1}, x_{t}] + b_{h})

(3)

The matrices W_r, W_z and W_h denote the weight matrices, while b_r, b_z, and b_h represent the biases. The ultimate output is determined by the update gate, and the formula for this computation is as follows:

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * \bar{h_{t}}

(4)

(3): CRF layer

In the domain of named entity recognition, it is imperative to acknowledge the existence of interdependencies among the labels. For example, the label “B-TE” is invariably followed by the proscription of the label “M-SE_AQ”. Nonetheless, the BiGRU model inadvertently conforms to a strategy of selecting the label with the highest probability as the predicted outcome, without due consideration for the inter-label constraints. In response to this challenge, the current study introduces the integration of a CRF (Conditional Random Field) layer. During the label prediction phase, this CRF layer meticulously evaluates both the individual probabilities of each label and the transition probabilities derived from the training corpus, effectively mitigating the likelihood of illicit labels and enhancing the precision of the predictive outcomes.

Suppose the input sequence is X = {x₁, x₂, x₃, …, x_n}, and the output label sequence is y = {y₁, y₂, y₃, …, y_n}. The score calculation formula is as follows:

S c o r e (X, y) = \sum_{i = 0}^{n} A_{y_{i}, y_{i + 1}} + \sum_{i = 1}^{n} P_{i, y_{i}}

(5)

In the formula, A is a transfer matrix of size (k + 2) × (k + 2), where Ai,j represents the score of label i transferring to label j. P is the output matrix of the BiGRU layer, with a size of n × k, where n indicates the sentence length and k represents the number of labels. pi,j denotes the score of the i-th word being marked as the j-th label. In order to derive the probabilities corresponding to all potential tag sequence scores, the softmax function is employed. The mathematical formula is delineated as follows:

P (y| X) = \frac{e^{S c o r e (X, y)}}{\sum_{y ~ \in Y_{X}} e^{S c o r e (X, y ~)}}

(6)

In the given formula, Y_X denotes the comprehensive set of all feasible label sequences corresponding to the input sequence X, with y~ representing the actual label sequence. Subsequently, a logarithmic transformation is applied to both sides of the equation, followed by the application of the Viterbi algorithm for decoding, thus identifying the sequence with the highest scoring value. The detailed computational formula is delineated as follows:

\log P (y| X) = S c o r e (X, y) - \log (\sum_{y ~ \in Y_{X}} e^{S c o r e (X, y ~)})

(7)

y^{*} = a r \underset{y ~ \in Y_{X}}{g m a x} S c o r e (X,, y ~)

(8)

4. Results and Analysis

4.1. System Testing Environment

In the context of this investigative endeavor, the experimental procedures were executed utilizing the Python and PyTorch frameworks. The corresponding software and hardware configuration is delineated in Table 6.

This investigation assesses the efficacy of the model by employing three pivotal performance metrics indigenous to the domain of knowledge extraction, viz., precision (P), recall (R), and the F1 score.

4.2. Experimental Data and Parameters

In this study, the dataset consists of 1456 annotated sentences specifically related to marine ranching equipment. To garner more profound insights, the dataset was subdivided utilizing the cross-validation methodology, adhering to a proportion of 8:1:1, which yielded a training corpus of 1164 sentences, a validation corpus of 146 sentences, and a test corpus of 146 sentences. Following an array of parameter fine-tuning trainings, the optimal parameter settings are outlined in Table 7.

4.3. Experimental Results

In order to substantiate the superiority of the model developed within the scope of this research, a comparative analysis was conducted against the prevalent algorithmic model, BiLSTM-CRF, and BERT-BiLSTM-CRF, in the domain of knowledge extraction. For enhanced stability in precision assessment, each extraction experiment was performed 10 times to determine average values and variances. The detailed outcomes of these experiments are delineated in Table 8.

4.4. Experimental Analysis

4.4.1. Comparison with Models

As illustrated in Table 8, experimental results demonstrated superior performance over the BiLSTM-CRF and BERT-BiLSTM-CRF models, achieving 86.58% precision, 77.82% recall, and 81.97% F1 score. Specifically, compared with the BiLSTM-CRF model, the precision, recall, and F1 score have increased by 9.38%, 9.37%, and 9.45%, due to BERT providing deep semantic representations and solving the problems of polysemy and data sparsity. Compared with the BERT-BiLSTM-CRF model, the precision, recall, and F1 score have increased by 1.21%, 2.50%, and 1.94%, respectively. It indicates that, in contrast to the BiLSTM model, the BiGRU model boasts a reduced parameter count, which not only enhances model performance but also accelerates the training process. The utilization of BiGRU as the encoding layer is deemed more appropriate for the text-based entity recognition task specific to marine ranching equipment.

4.4.2. Different Entity Recognition Results

In order to deepen our comprehension of the joint extraction of knowledge in the domain of marine ranching equipment, we performed a comprehensive evaluation of the BERT-BiGRU-CRF architecture’s efficacy in identifying diverse entity categories. The recognition efficacy pertaining to diverse entities is graphically shown in Figure 12.

An analysis of Figure 12 and associated data reveals that the F1 score for the marine ranching equipment module entity is the highest at 92.17%. Notably, entities such as marine design criteria, positioning methodology, and principal dimensions exhibit considerable F1 scores at 87.12%, 88.81%, and 89.85%, respectively. The reason might be that these entities are relatively few in number and their grammatical structures are complete and clear, thus making them less difficult to identify.

The presence of diverse nomenclature for various equipment entities, such as “batch feeder”, “bait dispenser”, and “GQ48902”, all referring to the “feeding machine” in the context of aquaculture equipment, leads to an increased prevalence of unrecognizable entities and manifests as a relatively low recall rate. Consequently, this impacts the overall performance of the model, yielding a lower recall rate and F1 score.

This study is the first to construct an entity recognition task within the domain of marine ranching. Currently, there are no comparable research results in the same domain to make relevant comparisons. In the related marine domain, Lv et al. [28] proposed an improved YOLO v5 target detection algorithm; its experimental results show that the values of mAP and F1 of the improved YOLO v5 target detection algorithm are 72.1% and 0.722, respectively, which are better than other target detection algorithms in terms of accuracy and reliability. Cao et al. [29] proposed the method of combining a neural network with the statistical model (BiLSTM-CRF) to identify marine drugs. The accuracy rate, the recall rate, and the F1 score are 72.23%, 66.76%, and 68.57%, respectively. The F1 score of this study is much higher than the standard level, indicating that the F1 score of 81.97% obtained in this study has reached the practical level.

However, there are also some limitations of the method and the dataset. First, the current corpus only contains 1456 labeled sentences, and it is necessary to expand the multi-language data. Furthermore, the knowledge graph has not yet integrated real-time data streams. It is planned to optimize it by combining the incremental learning function of the graph database.

4.5. Intelligent Question-Answering System for Marine Ranching Equipment

Based on the above knowledge graph, this study designed an intelligent question-answering system. The intelligent question-answering page of the system is shown in the Figure 13, which includes a search box and a result box. Users input questions in the search box and click the search button. Based on the marine ranching equipment knowledge graph and the intelligent question-answering algorithm, corresponding retrieval results will appear in the result box. To enhance the convenience of the system, this module adds a question recommendation function, which can be refreshed in real time to provide users with the questions they want to ask.

For the user input question, firstly, the question is segmented and its part-of-speech tagged by using the Jieba library and the custom entity dictionary, and the main entity in the question is obtained through the script. Then, the question is classified based on the BERT model to obtain the corresponding semantics of the question. Finally, the graph is retrieved through the Cypher query template to obtain the answer to the question.

5. Conclusions and Prospects

In essence, this study not only proposes the first structured KG framework for marine ranching equipment but also offers a transferable methodology for vertical domain knowledge extraction, yielding successful outcomes. The present study employs a hierarchical, top-down methodology to establish the model layer of marine ranching equipment, culminating in the formulation of an ontological framework for the marine ranching equipment KG. Subsequently, a bottom-up method is enacted to develop the corresponding data layer, facilitating the comprehensive acquisition of marine ranching equipment data and the subsequent extraction. Thereafter, the BERT-BiGRU-CRF model is used to accomplish the joint extraction of entity relationships within the marine ranching equipment domain. Ultimately, the graph data are stored within the Neo4j database. The conclusions are summarized as follows:

The Neo4j-based KG encapsulated 2153 nodes and 3872 edges, enabling scalable visualization and dynamic updates. Experimental results demonstrated superior performance over BiLSTM-CRF and BERT-BiLSTM-CRF, achieving 86.58% precision, 77.82% recall, and 81.97% F1 score.

By comparing with the existing knowledge graphs in the marine domain, it is pointed out that the existing knowledge graphs mainly focus on a maritime accident analysis and other aspects, while this study focuses on the extraction of entity relationships of marine ranching equipment. Marine ranching equipment is an important condition for developing deep-sea aquaculture. However, its grammar is complex, with problems such as fragmentation and information islands. This research fills the gap in this vertical domain.

Currently, there is still room for improvement in this study, and it will be further deepened and expanded from the following aspects:

In the initial phase, due to the fact that marine ranching equipment is a domain of knowledge and its development time is relatively short, the domain corpora for this domain are not very abundant. The training data of this article mostly comes from the Internet, the instructions of related companies, and some documents. Currently, we are collaborating with enterprise partners to expand multilingual corpora (such as English technical manuals) and discussing the increase in data volume through the fusion of multiple data modalities.

Furthermore, it is imperative to address the dynamic and continuous evolution of marine ranching equipment, which is characterized by rapid advancement and an ever-expanding body of knowledge. The capacity to capture data in real-time and facilitate the dynamic updating of the marine ranching equipment KG represents a pivotal challenge for future endeavors.

Finally, in recent years, language models based on large-scale corpora and pre-training techniques have become a research hotspot in the domain of natural language processing. For instance, the GPT series models proposed by OpenAI and the BERT model developed by Google, among which the BERT model has achieved outstanding results in various NLP tasks, such as sentence classification and intelligent question answering. The method in this study can be transferred to other vertical domains, and the future work will focus on integrating with large language models for application scenarios.

Author Contributions

Conceptualization: D.C.; methodology: D.C.; software: Z.G.; validation: Z.G. and S.L.; investigation: S.L., X.G. and Y.W.; writing—original draft: D.C.; writing—review and editing: H.Z. and D.Z.; visualization: S.L. and X.G.; supervision: H.Z. and D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

This research is supported by Yinghui Zhao.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, Z.; Ding, J.; Ji, Y.; Sun, L.; Liu, J. Exploration and Practice of Modern Information Technology and Engineering Equipment in Marine Ranching Construction. China Fish. 2020, 33–37. [Google Scholar]
Shen, W.; Hu, Q.; Shen, T.; ShenTu, J.; Jiang, Y. Research and experiment of intelligent floating fish accumulating equipment for ocean ranching. J. Shanghai Ocean. Univ. 2016, 25, 314–320. [Google Scholar]
Shi, J. Intelligent Equipment Technology for Offshore Cage Culture; Ocean Press: Beijing, China, 2018; pp. 1–159. [Google Scholar]
Huang, H.; Yu, J.; Liao, X. KG research review. Comput. Syst. Appl. 2019, 28, 1–12. [Google Scholar]
Singhal. Official Google Blog: Introducing the Knowledge Graph: Things, Not Strings. 2012. Available online: http://googleblog.blogspot.pt/2012/05/introducing-knowledge-graph-things-not.html (accessed on 23 March 2025).
Bizer, C.; Lehmann, J.; Kobilarov, G.; Auer, S.; Becker, C.; Cyganiak, R.; Hellmann, S. DBpedia—A crystallization point for the Web of Data. J. Web Semant. 2009, 7, 154–165. [Google Scholar] [CrossRef]
Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web, Banff, Canada, 8–12 May 2007; pp. 697–706. [Google Scholar]
Bollacker, K.D.; Cook, R.P.; Tufts, P. Freebase: A Shared Database Of Structured General Human Knowledge. In Proceedings of the 22nd National Conference on Artificial Intelligence, Vancouver, Canada, 22–26 July 2007; Volume 2, pp. 1962–1963. [Google Scholar]
Vrandečić, D. Wikidata: A New Platform for Collaborative Data Collection. In Proceedings of the 21st International Conference on World Wide Web, New York, NY, USA, 16–20 April 2012; pp. 1063–1064. [Google Scholar]
IMDB Official. IMDB. 27 February 2016. Available online: http://www.imdb.com (accessed on 23 March 2025).
MetaBrainz Foundation. Musicbrainz. 6 June 2016. Available online: http://musicbrainz.org/ (accessed on 23 March 2025).
Ren, T. Research and Application of OPC UA Information Model Based on KG. Master’s Thesis, Zhejiang University, Hangzhou, China, 2021. [Google Scholar] [CrossRef]
Gu, X.; Bao, J.; Lü, C. Assembly Semantic Information Modeling Based on KG. Aeronaut. Manuf. Technol. 2021, 64, 74–81. [Google Scholar] [CrossRef]
Wang, H.; Zhao, R. Knowledge graph of agricultural engineering technology based on large language model. Displays 2024, 85, 102820. [Google Scholar] [CrossRef]
Chen, Y.-H.; Chen, Y.-J.; Zhang, Y.-P.; Chu, T.-J. Revealing the Current Situation and Strategies of Marine Ranching. Development in China Based on Knowledge Graphs. Water 2023, 15, 2740. [Google Scholar] [CrossRef]
Liu, D.; Cheng, L. MAKG: A maritime accident knowledge graph for intelligent accident analysis and management. Ocean. Eng. 2024, 312, 14. [Google Scholar] [CrossRef]
Wu, X.; Jiang, T.; Zhu, Y.; Bu, C. Knowledge Graph for China’s Genealogy. In Proceedings of the 2020 IEEE International Conference on Knowledge Graph (ICKG), Nanjing, China, 9–11 August 2020. [Google Scholar] [CrossRef]
Sun, Y. Design and Implementation of Web Crawler System Based on Scrapy Framework. Master’s Thesis, Beijing Jiaotong University, Beijing, China, 2019. [Google Scholar]
Yun, W.; Zhang, X.; Li, Z.; Liu, H.; Han, M. Knowledge modeling: A survey of processes and techniques. Int. J. Intell. Syst. 2021, 36, 1686–1720. [Google Scholar] [CrossRef]
Noy, N.F.; Mcguinness, D.L. Ontology Development 101: A Guide to Creating Your First Ontology; Stanford Knowledge Systems Laboratory: Stanford, CA, USA, 2001. [Google Scholar]
Uschold, M.; Gruninger, M. Ontologies: Principles; methods and applications. Knowl. Eng. Rev. 1996, 11, 93–136. [Google Scholar] [CrossRef]
Jones, D.; Bench-Capon, T.; Visser, P. Methodologies for Ontology Development. Capon 1998, 62–75. [Google Scholar]
Knublauch, H.; Fergerson, R.W.; Noy, N.F.; Musen, M.A. The Protégé OWL Plugin: An Open Development Environment for Semantic Web Applications. In International Semantic Web Conference; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar] [CrossRef]
Xu, C.; Wang, F.; Han, J.; Li, C. Exploiting Multiple Embeddings for Chinese Named Entity Recognition. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2269–2272. [Google Scholar]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar] [CrossRef]
Ma, Z.; Ni, R.; Yu, K. The latest progress, key technologies and challenges of KG. J. Eng. Sci. 2020, 42, 1254–1266. [Google Scholar] [CrossRef]
Dou, J.; Qin, J.; Jin, Z.; Li, Z. KG based on domain ontology and natural language processing technology for Chinese intangible cultural heritage. J. Vis. Lang. Comput. 2018, 48, 19–28. [Google Scholar] [CrossRef]
Lv, C.; Cao, S.; Zhang, Y.; Xu, G.; Zhao, B. Methods studies for attached marine organisms detecting based on convolutional neural network. Energy Rep. 2022, 8, 1192–1201. [Google Scholar] [CrossRef]
Cao, X.; Yang, Y. Research on Chinese Named Entity Recognition in the Marine Field. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 21–23 December 2018. [Google Scholar]

Figure 1. Construction process of marine ranching equipment KG.

Figure 2. Preprocessing workflow diagram.

Figure 3. Construction process of the marine ranching equipment ontology.

Figure 4. Seven key concepts of the marine ranching.

Figure 5. Marine ranching equipment knowledge ontology database.

Figure 6. Partial KG of marine ranching equipment based on Neo4j (Figure explanation: Red circle is the “principal_dimension” of “Chang Jing No.1”, orange circle is the “use_condition” of “Chang Jing No.1”, coffee circle is the “netting” of “Chang Jing No.1”, green circle is the “aquaculture” of “Chang Jing No.1”, purple circle is the “water_treatment” of “Chang Jing No.1”).

Figure 7. Example of knowledge storage in semi-structured text.

Figure 8. BERT-BiGRU-CRF model structure.

Figure 9. Transformer encoding structure.

Figure 10. Input representation of BERT model.

Figure 11. GRU gate model.

Figure 12. Entity recognition result.

Figure 13. Intelligent question-answering system for marine ranching equipment.

Table 1. Key concepts and terms.

Key Concepts	Domain Terminology
Deep-sea cages	Jing Hai No. 1 Guo Bao No. 1 Chang Jing No. 1
Environmental monitoring equipment	Salinity sensor Flow velocity sensor Remote-controlled intelligent winch
Energy equipment	Emergency generator Diesel generator UPS uninterruptible power supply
Navigation equipment	Radar Manual horn Radar transponder AIS (Automatic Identification System)
Security equipment	Life jacket Life raft CO₂ fire extinguisher

Table 2. Attributes of the ontology of marine ranching equipment (sample).

Attribute/Label	Domain	Range
Introduction	Multi-functional platforms, engineering vehicles, and other equipment modules	String
Company	Deep-sea cages, sea fishing boats, and other equipment modules	String
Function	Multi-functional platforms, deep-sea cages, and other equipment modules	String
Preserve	Aquaculture equipment, energy equipment, and other equipment	String
Note	Net clothing equipment, water treatment equipment, and other equipment	String
Put	Aquaculture equipment, security equipment, and other equipment	String
Define	Marine ranching areas (e.g., equipment-based marine ranching type)	String
Facility	Marine ranching areas (e.g., equipment-based marine ranching type)	String

Table 3. Semantic relationships of marine ranching equipment.

Relationship	Relationship Description	Domain	Range/Relationship Category
Safety	The relationship among multi-functional platforms, deep-sea cages, and security equipment.	Multi-functional platforms, deep-sea cages	Security equipment
Conduction	The relationship among multi-functional platforms, deep-sea cages, and navigation equipment	Multi-functional platforms, deep-sea cages	Navigation equipment
Netting	The relationship between deep-sea cages and net clothing equipment	Deep-sea cages	Net clothing equipment
Power	The relationship between the engineering vehicles and energy equipment	Engineering vehicles	Energy equipment
Aquaculture	The relationship between deep-sea cages and aquaculture equipment	Deep-sea cages	Aquaculture equipment
Positioning_mode	The relationship among multi-functional platforms, deep-sea cages, and positioning methodology	Multifunctional platforms, deep-sea cages	Positioning methodology
Use_condition	The relationship among engineering vehicles, marine ranching observatory and marine design criteria	Engineering vehicles, marine ranching, observatory	Marine design criteria
Principal_dimension	The relationship among multi-functional platform, sea fishing boat and principal dimensions	Multi-functional platforms, sea fishing boats	Principal dimensions

Table 4. Neo4j basic elements description.

Neo4j Basic Element	Effect	Description Object
Labels	Describe ontology concept	Multi-functional platforms, security equipment, and other ontology concepts
Nodes	Describe entity	Examples of radar, diesel generator, etc.
Relations	Describe the relationships between entities	Security equipment and other relations
Attributes	Describe attributes of entities and relationships	Attributes such as name, model, and operation mode

Table 5. TE + SE + R_i + BMESO tagging strategy and description of tags.

Tag	Description
TE	Theme entity
R_i	Relationship type between theme entity and secondary entity (SE_i)
B-TE	Beginning character of theme entity
M-TE	Middle character of theme entity
E-TE	Ending character of theme entity
S-TE	Theme entity of single character
B-SE_R_i	Beginning character of secondary entity (SE_i)
M-SE_R_i	Middle character of secondary entity (SE_i)
E-SE_R_i	Ending character of secondary entity (SE_i)
S-SE_R_i	Secondary entity (SE_i) of single character
O	Other non-named entity characters

Table 6. Hardware and software configuration.

Configuration Environment	Parameter
Operation system	Windows 10 64-bit
CPU	AMD Ryzen 7 5800X
GPU	NVIDIA GeForce RTX 3090 (24 G)
Memory	32 G
Python	3.9.13
PyTorch	1.13.1

Table 7. Experimental parameter settings.

Parameter	Parameter Value
batch_size	64
epoch	30
seq_max_len	128
learning_rate	0.01
GRU hidden state dimension	300

Table 8. Comparison of model effects.

Model	Precision P/%	Recall R/%	F1 Score/%
BiLSTM-CRF	77.20 ± 1.08	68.45 ± 1.02	72.52 ± 0.90
BERT-BiLSTM-CRF	85.37 ± 0.96	75.32 ± 0.66	80.03 ± 0.72
BERT-BiGRU-CRF	86.58 ± 0.90	77.82 ± 0.72	81.97 ± 0.66

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, D.; Gao, Z.; Li, S.; Guo, X.; Wu, Y.; Zhang, H.; Zhang, D. Joint Entity–Relation Extraction for Knowledge Graph Construction in Marine Ranching Equipment. Appl. Sci. 2025, 15, 7611. https://doi.org/10.3390/app15137611

AMA Style

Chen D, Gao Z, Li S, Guo X, Wu Y, Zhang H, Zhang D. Joint Entity–Relation Extraction for Knowledge Graph Construction in Marine Ranching Equipment. Applied Sciences. 2025; 15(13):7611. https://doi.org/10.3390/app15137611

Chicago/Turabian Style

Chen, Du, Zhiwu Gao, Sirui Li, Xuruixue Guo, Yaqi Wu, Haiyu Zhang, and Delin Zhang. 2025. "Joint Entity–Relation Extraction for Knowledge Graph Construction in Marine Ranching Equipment" Applied Sciences 15, no. 13: 7611. https://doi.org/10.3390/app15137611

APA Style

Chen, D., Gao, Z., Li, S., Guo, X., Wu, Y., Zhang, H., & Zhang, D. (2025). Joint Entity–Relation Extraction for Knowledge Graph Construction in Marine Ranching Equipment. Applied Sciences, 15(13), 7611. https://doi.org/10.3390/app15137611

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Joint Entity–Relation Extraction for Knowledge Graph Construction in Marine Ranching Equipment

Abstract

1. Introduction

2. Hybrid KG Construction Methodology

2.1. Data Acquisition and Preprocessing

2.2. Knowledge Modeling

2.2.1. Determine the Domain of the Ontology

2.2.2. Determine the Structure and Related Elements

2.2.3. Marine Ranching Equipment Ontology Construction

2.3. Knowledge Extraction

2.4. Knowledge Storage

3. Joint Extraction of Knowledge in the Domain of Marine Ranching Equipment

3.1. Rule-Based Joint Extraction of Entity Attributes

3.2. Joint Entity Relation Extraction Based on Deep Learning Models

3.2.1. The Innovative TE + SE + Ri + BMESO Tagging Strategy

3.2.2. The Specific Structure and Working Principle of the BERT-BiGRU-CRF Model

4. Results and Analysis

4.1. System Testing Environment

4.2. Experimental Data and Parameters

4.3. Experimental Results

4.4. Experimental Analysis

4.4.1. Comparison with Models

4.4.2. Different Entity Recognition Results

4.5. Intelligent Question-Answering System for Marine Ranching Equipment

5. Conclusions and Prospects

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2.1. The Innovative TE + SE + R_i + BMESO Tagging Strategy