An Ontological Metro Accident Case Retrieval Using CBR and NLP

: Metro accidents are apt to cause serious consequences, such as casualties or heavy economic loss. Once accidents occur, quick and accurate decision-making is essential to prevent emergent accidents from getting worse, which remains a challenge due to the lack of e ﬃ cient knowledge representation and retrieval. In this research, an ontological method that integrates case-based reasoning (CBR) and natural language processing (NLP) techniques was proposed for metro accident case retrieval. An ontological model was developed to formalize the representation of metro accident knowledge, and then, the CBR aimed to retrieve similar past cases for supporting decision-making after the accident cases were annotated by the NLP technique. Rule-based reasoning (RBR), as a complementary of CBR, was used to decide the appropriate measures based on those that are recorded in regulations, such as emergency plans. A total of 120 metro accident cases were extracted from the safety monthly reports during metro operations and then built into the case library. The proposed method was tested in MyCBR and evaluated by expert reviews, which had an average precision of 91%.


Introduction
The metro has recently become a popular means of public transportation. As of 5 September 2016, in China, 44 cities are preparing to construct metro systems and 27 cities are already operating metro systems [1]. However, as more and more metro lines are put into service, catastrophic accidents occasionally occur during the metro operation, such as turnout failures, signal failures, and terrorist attacks. For instance, in the serious Daegu metro accident (18 February 2003), 192 persons died and another 151 were injured [2]. In the metro collision accident in Washington, about 80 persons were injured (22 June 2009) [3]. In the rear-end accident of line ten in Shanghai (27 September 2011), 40 persons were injured and the line operation was interrupted for more than six hours [4].
Metro accidents are apt to cause serious consequences, such as casualties or heavy economic loss, due to complex underground structures, complicated facilities, narrow spaces, and crowded passengers [2,5]. Therefore, this sets a higher demand for accident prevention and response. Once accidents occur, a quick response and accurate decision-making are essential to prevent emergent accidents from getting worse, which is still a challenge to managers in metro operation companies and was the research starting point of this study.
Traditionally, the manager tends to retrieve the related knowledge to support decision-making, such as previous similar accident records, regulations, and emergency plans. Historical cases are In general, previous studies related to the safety management of metro systems mainly focused on risk identification and analysis. Few studies have focused on how to support decision-making to respond to accidents during metro operations, which is also an essential factor in reducing damage after metro accidents occur.

Ontology Technology for Safety Management
Safety management is a knowledge-intensive process in the construction industry, where information is scattered across various systems; due to the lack of common representation, the heterogeneous information cannot be shared and re-used effectively between different systems. Ontology, as an essential semantic technique, provides a precise, formal specification of a shared conceptualization of a domain [22]. Compared with the database schema, ontology technology can provide a way to present knowledge with explicit and rich semantics, which support efficient knowledge reasoning and query [23].
Many industries have developed ontologies for efficient knowledge management, such as medicine, computer science, and biology. For instance, using ontology engineering, Koo, et al. [24] proposed a semantic framework for integrating the processing model and dataset in the domain of biorefining. In the construction industry, Ding, et al. [25] established an ontology-based semantic network to produce a construction risk knowledge map, in which the ontology was used to standardize the description of each aspect of risk knowledge and facilitate the knowledge reasoning and retrieval. Wang, et al. [26] built an emergency plan system ontology to promote communication and sharing between different plan systems. Wu, et al. [27] developed a domain ontology to represent knowledge of scenario-based hazard evaluation. Guo and Goh [28] established an ontology model to formalize the knowledge of active fall protection system design, which can facilitate the knowledge reuse and sharing among professional engineers.

CBR-Based Decision-Making
CBR generally refers to the process of solving new problems based on experience with similar past cases [29]. It is suggested by Schank [30] that the experiences and cases can be reused to facilitate decision-making when a manager faces similar problems. CBR is memory-based and stimulates the human thinking process, which has been always applied in problem-solving fields [31]. As a decision support tool, CBR is well suited to construction safety management considering the construction industry's large amount of historical experiences. For example, Guo, et al. [32] proposed a CBR system to provide design support for an injection mold design. Xie, et al. [33] proposed a CBR system for a hydro-generator design and developed a case base library in this system to facilitate the design process. Yu, et al. [6] used the CBR method to analyze the response to risks connected with the urban water supply network. Additionally, rule-based reasoning (RBR) is usually added as the complementary method of CBR in the decision-making process, considering that CBR may fail when no relevant case is available [34]. Ferrara and Baumgartner [35] discussed the corresponding advantages and disadvantages of CBR and RBR, and pointed out that their integration is useful. Shi and Barnden [36] presented a hybrid approach combining CBR with RBR for diagnosing multiple faults. Goh and Guo [8] presented a web-based CBR-RBR system to support the design of an active fall protection system.
Typically, CBR contains four main processes: retrieve, reuse, revise, and retain, where retrieve is the most important process in any CBR system [7]. Finding an effective method for case representation ensures that domain knowledge can be acquired accurately and easily, thus laying a good foundation for efficient retrieval [32]. However, data regarding metro accident cases are usually stored in plain-text documents or rigid electronic files that are in unstructured formats, such as PDF files, Word processing files, or HTML pages. The non-structural data do not follow a machine-readable format, thereby leading to a poor semantic understandability of traditional CBR [32]. One potential solution to this problem is the development of a semantic framework based on ontologies. Ontology technology used by a CBR system has several functions, such as representing cases and facilitating similarity calculations. Tung, et al. [37] combined CBR, RBR, and ontology to develop a solution retrieval system for expert searches and problem diagnosis. Bouhana, et al. [38] proposed an ontology-based CBR information retrieval system that can improve the accuracy of case retrieval and facilitate the search process. Xu, et al. [39] developed a knowledge model based on ontology and then used CBR and RBR to support automated decision-making for the disassembly of mechanical products. Maalel, et al. [40] applied an ontology-based CBR approach to developing a system that can help to build and operate historical railroad accidents.
Previous studies paved the way to recognize the advantage of the combination of CBR and ontology techniques in the construction industry. However, few efforts have been conducted to build an ontological CBR framework to facilitate knowledge representation and retrieval for decision-making during metro operations.
Therefore, by combining ontology with CBR, a research framework for supporting decision-making in the metro accident response was constructed, and the feasibility of the framework was verified using a metro case study. The main contributions of this study are as follows: first, an ontology model was developed for comprehensively formalizing the domain knowledge and thus providing a good foundation for metro accident retrieval. Second, we used the NLP techniques to semi-automate the annotation of metro accident cases into keywords defined in the ontology model for the following similarity calculation. Third, the Semantic Web Rule Language (SWRL) was used to represent rules for reasoning through the response measures recorded in regulations, such as emergency plans for metro operations, which can support decision-making when there are no similar cases in the case library.

The Overall Ontological CBR Framework for Metro Accident Responses
The proposed ontological CBR method aimed to improve the performance of knowledge retrieval for supporting decision-making during metro operations. Additionally, the proposed method can facilitate the development of an online knowledge system for metro accident responses. Figure 1 shows the overall structure of the system, which consists of four layers: information acquisition, ontology development, semantic processing, and application service. In the information acquisition layer, heterogeneous information provided by different data sources, including regulations, emergency plans, and accident records, are collected and usually stored in several types of forms. For the information acquisition layer, specific ontologies are developed to describe the knowledge and information of the different domains. NLP techniques were used to achieve the semi-automated annotation of metro accidents. For the semantic processing layer, rule containers, reasoning engines, and an annotated case base were created to support the management and application of the ontology-described knowledge. In the application service layer, multiple service applications, including knowledge sharing and retrieval, case-based reasoning, and rule-based reasoning, were provided through the programmable interfaces for users. The detailed knowledge retrieval process of the proposed method is shown in Figure 2.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 27 knowledge and information of the different domains. NLP techniques were used to achieve the semiautomated annotation of metro accidents. For the semantic processing layer, rule containers, reasoning engines, and an annotated case base were created to support the management and application of the ontology-described knowledge. In the application service layer, multiple service applications, including knowledge sharing and retrieval, case-based reasoning, and rule-based reasoning, were provided through the programmable interfaces for users. The detailed knowledge retrieval process of the proposed method is shown in Figure 2. In the proposed method, the domain knowledge of metro operational accidents (e.g., basic concepts, relations, rules, etc.) was identified. Then, the ontology model was established to semantically represent the domain knowledge, in which the concepts were represented by classes and individuals, while the relations between different concepts were represented by properties. The complex knowledge reasoning could be achieved through the semantic rules in the ontological model. By combining NLP techniques and the ontology model, the specific knowledge stored in historical accident records was processed and semantically annotated with the concepts in the ontology. This enabled the accident reports to be organized in the structure of the given ontology model, semantically retrieved, and reasoned via their annotation information. Thus, based on the semantic representation, these annotated cases could be understood by computational reasoning, which supported the decision-making process by calculating the similarities between accident cases.
At last, as shown in Figure 2, the knowledge base for the metro accident response was developed by integrating the fact base, rule base, and case base. The integrated knowledge base could support further rule-based reasoning (RBR) and case-based reasoning (CBR). In the proposed method, the domain knowledge of metro operational accidents (e.g., basic concepts, relations, rules, etc.) was identified. Then, the ontology model was established to semantically represent the domain knowledge, in which the concepts were represented by classes and individuals, while the relations between different concepts were represented by properties. The complex knowledge reasoning could be achieved through the semantic rules in the ontological model. By combining NLP techniques and the ontology model, the specific knowledge stored in historical accident records was processed and semantically annotated with the concepts in the ontology. This enabled the accident reports to be organized in the structure of the given ontology model, semantically retrieved, and reasoned via their annotation information. Thus, based on the semantic representation, these annotated cases could be understood by computational reasoning, which supported the decision-making process by calculating the similarities between accident cases.
At last, as shown in Figure 2, the knowledge base for the metro accident response was developed by integrating the fact base, rule base, and case base. The integrated knowledge base could support further rule-based reasoning (RBR) and case-based reasoning (CBR).

Knowledge Source for Metro Accidents
Knowledge related to metro accident responses is contained in the various documents, such as regulations, emergency plans, and accident records. These documents were selected as the knowledge source that can be divided into general knowledge (from regulations, emergency plans, and decision rules of the metro enterprise) and specific case knowledge (from historical accident records). The related knowledge supported the development of the ontology model. The related documents were introduced as follows:

Knowledge Representation in the Ontology Model
In terms of knowledge representation, the proposed method aimed to comprehensively formalize the domain knowledge and thus provide a good foundation for knowledge retrieval. An ontology Appl. Sci. 2020, 10, 5298 7 of 24 model was developed to attain this goal. The ontology provides the common vocabulary/concepts and relationships for the accident knowledge and their context description. The concepts and relations extracted from the domain knowledge were used to build the ontological model.

Taxonomy of Metro Accidents
The taxonomy of the basic concepts of metro accidents is a foundational step in developing the ontology model, which means that all concepts are hierarchically divided into categories or sub-categories. Studies of applying taxonomy for analyzing the specification information can be found in the literature. For example, Anumba, et al. [43] proposed a synthesized taxonomy development method for constructing contractual semantics, which can be utilized to support ontology modeling. The taxonomy of metro accident concepts was determined based on related regulations, emergency plans, decision rules, and accident records, which was used to facilitate the building of an ontological model of metro accidents. The majority of accidents during metro operations were represented by equipment failures, such as signal system failures and metro vehicle failures. The accidents of signal system failures were selected as specific objects for case retrieval in this paper. For example, Table 1 shows the taxonomy of a specific accident during metro operations.

Ontology Model Development
The ontological model includes different classes, properties, and individuals. These classes and individuals were created based on a taxonomy of metro accidents, while the properties were created based on the relations between these concepts. The ontological model was established as shown in Figure 3. This ontological model acts as the data structure to systematize the representation of accident cases, which contains three parts: accident characteristics, accident responses, and post-accident assessment.
The ontological model includes different classes, properties, and individuals. These classes and individuals were created based on a taxonomy of metro accidents, while the properties were created based on the relations between these concepts. The ontological model was established as shown in Figure 3. This ontological model acts as the data structure to systematize the representation of accident cases, which contains three parts: accident characteristics, accident responses, and postaccident assessment.  (1) Accident characteristic

Metro operation accident
The accident characteristic category refers to basic information relating to metro operating accidents, such as the accident location, fault characteristics, and the accident type. The main accident locations are on the train, in the station, in the running section, or in the yard. The fault characteristic is the core element of the accident characteristic and the external manifestation of an accident. The managers usually recognize the accident through the identification of the accident characteristics. Different accidents have different fault characteristics. The accident types are determined through the fault characteristic. In this study, the accident types in metro operations were terrorist attacks, natural disasters, emergencies, passenger service failure, equipment system failure, and other accidents.
(2) Accident response Accident responses are essential to reducing accidental damage. The accident response category refers to the specific procedures and details for handling accidents during metro operations. This category includes staff responses, resource responses, and measure responses.
(3) Post-accident assessment The post-accident assessment refers to the treatments applied after a metro accident to avoid the same accident from happening again. This process includes the accident impact, accident causes, and corrective measures. The accident impact includes casualties, economic loss, and traffic delays. (1) Accident characteristic The accident characteristic category refers to basic information relating to metro operating accidents, such as the accident location, fault characteristics, and the accident type. The main accident locations are on the train, in the station, in the running section, or in the yard. The fault characteristic is the core element of the accident characteristic and the external manifestation of an accident. The managers usually recognize the accident through the identification of the accident characteristics. Different accidents have different fault characteristics. The accident types are determined through the fault characteristic. In this study, the accident types in metro operations were terrorist attacks, natural disasters, emergencies, passenger service failure, equipment system failure, and other accidents. (2) Accident response Accident responses are essential to reducing accidental damage. The accident response category refers to the specific procedures and details for handling accidents during metro operations. This category includes staff responses, resource responses, and measure responses. (3) Post-accident assessment The post-accident assessment refers to the treatments applied after a metro accident to avoid the same accident from happening again. This process includes the accident impact, accident causes, and corrective measures. The accident impact includes casualties, economic loss, and traffic delays.

Accident Cases Formalization
The aforementioned ontological model provides a unified semantic structure for the standardized expression of metro accident cases. Based on the semantic framework above, historical accident cases can be represented in a standard semantic format to facilitate computer analysis and processing.
Based on the developed ontology model, the metro accidents can be described by a set of attributes that can be divided into three main categories, namely, accident characteristics, accident response, and post-accident assessment. The name of these attributes is defined in the classes and properties of the ontology model, while the value of these attributes is based on the individuals of the ontology model. The details of the main attributes are shown in Table 2. Therefore, a case can be expressed as a three-tuple, C = (C c , C r , C p ), where C c , C r , and C p refer to a set of attributes that describe the accident, its accident response, and its post-accident assessment, respectively. For a given case base C b , C i is the sequence number of a case in C b (1 ≤ i ≤ n, where n is the total number of cases in C b ). C ic , C ir , and C ip represent the accident characteristic, accident response, and post-accident assessment of case C i , respectively.
Web Ontology Language (OWL) was selected to encode the metro accident knowledge. OWL enables the knowledge to be linked together and represented semantically in a semantic network. The accident knowledge is exported into an "OWL" file that can be managed by the ontology management tool. Based on the concepts in the developed ontology model, textual accidents can be represented in a standard format, which can be used for further retrieval. In addition to the case retrieval, the corresponding responses to the accidents should also be obtained. Although the ontology was developed to represent domain knowledge, it cannot express the complex constraint knowledge that typically occurs in the form of rules. Therefore, the expression of complex constraint knowledge requires other technologies to be implemented, such as SWRL (Semantic Web Rule Language). Additionally, based on the ontology model, the metro accident cases can be annotated for further retrieval.

Automated Annotation of Metro Accidents
As shown in Figure 4, natural language processing (NLP) techniques were applied to annotate the accident records, which include word segmentation and stop word removal. The NLP can further support the semi-automated annotation of accident records by extracting the keywords of the accident records such that they can be semantically matched with the concepts in the ontology. Thus, the similarity calculation between the query sentences and past accidents can be quickly achieved after the annotation of accidents.

Automated Annotation of Metro Accidents
As shown in Figure 4, natural language processing (NLP) techniques were applied to annotate the accident records, which include word segmentation and stop word removal. The NLP can further support the semi-automated annotation of accident records by extracting the keywords of the accident records such that they can be semantically matched with the concepts in the ontology. Thus, the similarity calculation between the query sentences and past accidents can be quickly achieved after the annotation of accidents.  (1) Word segmentation Word segmentation divides a string of written language into its component words. A typical feature of the Chinese language is that there is no blank space between characters in a sentence, which results in a variety of ways for word segmentation and leads to different interpretations. For example, "钢筋混凝土 (reinforced concrete)" could be interpreted as "钢筋 (bar)" and "混凝 土 (concrete)." Inaccurate word segmentation can lead to ambiguities and thus hinder the interpretation of the information. A function provided by Jieba (a Chinese word segmentation (1) Word segmentation Word segmentation divides a string of written language into its component words. A typical feature of the Chinese language is that there is no blank space between characters in a sentence, which results in a variety of ways for word segmentation and leads to different interpretations. For example, "钢筋混凝土 (reinforced concrete)" could be interpreted as "钢筋 (bar)" and "混凝土 (concrete)." Inaccurate word segmentation can lead to ambiguities and thus hinder the interpretation of the information. A function provided by Jieba (a Chinese word segmentation software based on Python) can overcome this difficulty by enabling users to create their own dictionaries. In this study, the concepts defined in the ontology model can be used as an additional dictionary for Chinese word segmentation. With the help of the ontology model, words can be segmented accurately with a word-group such that sentences in the accident reports can be interpreted correctly.
(2) Stop words removal Stop words are extremely common words with less importance, such as prepositions, pronouns, and conjunctions. Through removing these meaningless words from the text, the retrieval efficiency can be improved. This paper used the stop list released by the Language Technology Platform (LTP), which contained 1208 words and punctuation marks [44].
After the data pre-processing procedure, including word segmentation and stop word removal, has been conducted, these accidents can be annotated and represented by a set of keywords defined in the ontology model, which can be used for similarity calculations in the case retrieval.

Similarity Measurement of the Accident Cases
The mechanism of the proposed framework for decision support is as follows: first, the query sentences were transformed into formalized concepts in the developed ontology model; then, the similarity between the query concepts and concepts of stored cases needs to be measured; finally, the most similar case is identified for reuse. Because the case base usually contains a large number of cases, it is crucial to evaluate the similarity between cases and ultimately find a similar case for reference.
Each case has specific attributes (numerical and symbolic). In modeling cases with ontology, the similarity between cases was calculated by the local-global approach. The local similarity focuses on the specific attributes of a case, such as the similarity of fault characterization. After the calculation of all local similarities, the global similarity is measured to reveal the overall similarity between cases, which considers the weight of each attribute.

Local Similarity
There are various attributes of cases in different domains. As far as the metro accident domain is concerned, the attributes can be divided into explicit numerical attributes and explicit symbolic attributes. Take the signal failure accident as an example; Table 3 shows the attributes of signal failure accidents.
The proposed similarity measure methods for the attributes of signal failure accidents are presented as follows.

Similarity of Numerical Attributes
For numerical attributes, the similarity calculation is typically based on the absolute distance of the two values to be compared. The similarity of numerical attributes is calculated using Equation (1): where Sim n (X i , Y i ) denotes the similarity of attribute i of cases X and Y, and the dis(X i − Y i ) reveals the absolute distance. In Equation (1), X i and Y i denote the values of attribute i of cases X and Y, respectively. Given the ontology knowledge structure of the fault characteristic, for the symbolic attribute, the method aims to measure the comprehensive semantic similarity adopted from Dong [45].
The computation of a comprehensive similarity between symbolic attributes consists of two parts: the tree similarity and the similarity of the upper attributes. The tree similarity based on the semantic tree reveals the direct similarity of the attributes, while the similarity of the upper attributes reveals the indirect similarity of the concepts. Both of them are measured by considering the ancestor-child and parent-child relationships in similar maximal paths. After determining the similarities based on the semantic tree and based on the upper attributes, the comprehensive similarity can be obtained based on the weighted average of these similarities.
(1) Similarity measure based on the semantic tree Sim t denotes the similarity based on the semantic tree, which is the direct similarity between two concepts [46]. As shown in Equation (2), the similarity measure is based on the level of the attributes and the distance between them: In Equation (2), the dl(X i ) and dl(Y i ) reveal the semantic level of the attributes, the dis(X i , Y i ) is the minimum distance between X i and Y i in the hierarchy, and the maxdl is the maximum number of hierarchy conceptual structures. Additionally, α is an adjustable parameter that is the semantic distance of similarity to 1, where α is equal to 0.5 for a large number of experiments. Based on these parameters, the Sim t (X i , Y i ) can be calculated through Equation (2). When the value of Sim t (X i , Y i ) is 1, an equivalent relationship exists between the corresponding attributes X i and Y i .
(2) Similarity measure based on the upper attributes Sim s denotes the similarity of X i and Y i based on the upper attributes. It reveals the indirect similarity of X i and Y i . Equation (2) is used to calculate the similarity of X i and Y i based on the upper attributes.
(3) Comprehensive similarity measure Sim c denotes the comprehensive similarity, which refers to the final value of similarity between target attributes X i and Y i . As shown in Equation (3), this similarity is obtained by calculating the average value of the similarities based on the semantic tree and the upper attributes: In Equation (3), β denotes the weight value of the similarity based on the semantic tree, and γ denotes the weight value of the similarity based on the upper attributes. The values of β and γ range from 0 to 1. In this paper, β and γ were set to 0.5, which suggests that the similarities based on the semantic tree and the upper attributes had equal influences. For example, the semantic concept tree of signal failure illustrated in Figure 5 supports the similarity measure between the signal failure attributes. An example is given to show the calculation process of the similarity using the aforementioned methods. The similarity between "turnout failure" (abbreviated as T) and "STC equipment failure" (abbreviated as S) was calculated as follows. First, as shown in (4), we needed to measure the tree similarity based on the developed semantic model using Equation (2). Then, the tree similarity of the upper attribute was calculated in (5). The upper attribute of "turnout indication failure" and "STC equipment failure" was "wayside equipment failure" (abbreviated as W) and "signal system infrastructure" (abbreviated as I), respectively. Finally, the comprehensive similarity between T and S can be measured in (6) using Equation (3), where the value was 0.133. Thus, as shown in Table 4, the similarity values between the attributes of signal failure were obtained.  An example is given to show the calculation process of the similarity using the aforementioned methods. The similarity between "turnout failure" (abbreviated as T) and "STC equipment failure" (abbreviated as S) was calculated as follows. First, as shown in (4), we needed to measure the tree similarity based on the developed semantic model using Equation (2). Then, the tree similarity of the upper attribute was calculated in (5). The upper attribute of "turnout indication failure" and "STC equipment failure" was "wayside equipment failure" (abbreviated as W) and "signal system infrastructure" (abbreviated as I), respectively. Finally, the comprehensive similarity between T and S can be measured in (6) using Equation (3), where the value was 0.133. Thus, as shown in Table 4, the similarity values between the attributes of signal failure were obtained.

Global Similarity
The global similarity reflects the degree of similarity between cases. We used the weighted Euclidean distance for the measure of global similarity. The weight of each attribute is determined by discussing the relative importance of each attribute in the overall evaluation when trying to find a similar case for solving a certain problem.
Assuming that the case representation consists of n features with feature weights w i , the similarity between target cases X and Y can be computed as follows: In Equation (7), Sim i (X i ,Y i ) denotes the local similarity and derives from Sim n (X i ,Y i ) or Sim c (X i ,Y i ), w i denotes the weight of attribute i of cases X and Y, and Sim denotes the global similarity.

Accident Case Development and Implementation for the Accident Response-A Case Study
To provide a proof-of-concept implementation for the proposed method, a case study was conducted. The case study focused on modeling actual metro accident knowledge and illustrating the use of the proposed method within future metro accident analyses. Protégé 4.3 was used to develop the ontological model and MyCBR was used for creating the CBR prototypes. Protégé is a free, open-source ontology editor that can provide a visual environment to create and edit an ontology model [47]. MyCBR allows for preliminary modeling of similarity measures and supports the simulation of a case retrieval [48].

Case Database
The metro accident cases in a metro company in Wuhan used to develop the accident case database mainly came from the following sources: (1) Checking records of metro operations, which included both the contents and results of operational checks. (2) Safety production monthly reports of metro operations, which kept a record of all accidents that have occurred over the previous month and their corresponding details, including the accident process description, relevant accident response measures, accident impact, cause analysis, and corrective measures.
For example, a turnout failure accident saved in the safe production monthly report of the metro operations was introduced as follows, which consists of the accident characteristics, accident response, and post-accident assessment. At 6:12 a.m. on 14 July 2016, the indication of the 16th and 18th turnouts were lost, thereby affecting the metro operation and subsequently resulting in train delays. At 6:15 a.m., the operational company decided to change the train route. At 6:47 a.m., the fault was fixed, and the train route was changed back to the original line. The cause of the accident and the corrective measures as determined after the investigation. The turnout failure was attributed to the omission of overhaul work on the 16th and 18th turnouts due to bad weather. However, it is worthy to note that the accident records, such as the one shown in Figure 6, were unstructured and expressed in natural language. This means that they were difficult to process digitally. These documents would be processed into structured data using NLP techniques. Based on the statistical analysis of the metro operating accidents, more than 80% of accidents were caused by the poor state of equipment and were represented by equipment failures, such as signal system failures and metro vehicle failures. Thus, in this study, the "signal system failure" accidents were selected as the concrete example to test the proposed method. As shown in Figure 7, A total of 120 accidents were extracted from the safety monthly reports on metro operations. With the support of NLP techniques, these accidents were analyzed and automatically annotated. The classes, properties, and individuals of the abovementioned ontological model were used to assist the annotation of metro accidents, as shown in Figure 6, such as the "train number," "station," and "fault characteristics." These annotations belong to different attributes, which can be used in the similarity calculations. These cases were stored in CSV files [49].
Based on the statistical analysis of the metro operating accidents, more than 80% of accidents were caused by the poor state of equipment and were represented by equipment failures, such as signal system failures and metro vehicle failures. Thus, in this study, the "signal system failure" accidents were selected as the concrete example to test the proposed method. As shown in Figure 7, the classes and individuals of the "signal system failure" were developed using Protégé 4.3.
Based on the statistical analysis of the metro operating accidents, more than 80% of accidents were caused by the poor state of equipment and were represented by equipment failures, such as signal system failures and metro vehicle failures. Thus, in this study, the "signal system failure" accidents were selected as the concrete example to test the proposed method. As shown in Figure 7, the classes and individuals of the "signal system failure" were developed using Protégé 4.3.

Accident Cases Retrieval
In this case, MyCBR followed the local-global approach, which divided the similarity definition into a set of local similarity measures for each attribute, a set of attribute weights, and a global similarity measure for calculating the final similarity value. In this case, for an attribute-value-based case representation consisting of n attributes, the similarity between an input case X and a known case Y may be calculated by using Equation (4) in Section 5.2.
The similarity measures of the attributes of accidents were calculated in MyCBR, where various modes for measuring similarity are provided, including the standard, table, and taxonomy modes. The standard mode measures the similarity of numeric attributes using a basic formula, while the table and taxonomy modes measure the similarity of symbolic attributes. Specifically, the table mode is based on the user-defined concept similarity degree, while the taxonomy mode is based on the hierarchical structure of the attribute value.
The attributes of accident cases were divided into numerical attributes and symbolic attributes. The "standard mode" was adapted to measure the similarity of partial numerical attributes, including "time delay," "train delay," "train failure," and "clean off the train," as shown in Figure 8. The "table mode" was adopted to measure the similarity of the symbolic attributes. The core algorithm of the "standard mode" and "table mode" was the similarity measure method of the numerical attributes and symbolic attributes given in Section 5.1.

Accident Cases Retrieval
In this case, MyCBR followed the local-global approach, which divided the similarity definition into a set of local similarity measures for each attribute, a set of attribute weights, and a global similarity measure for calculating the final similarity value. In this case, for an attribute-value-based case representation consisting of n attributes, the similarity between an input case X and a known case Y may be calculated by using Equation (4) in Section 5.2.
The similarity measures of the attributes of accidents were calculated in MyCBR, where various modes for measuring similarity are provided, including the standard, table, and taxonomy modes. The standard mode measures the similarity of numeric attributes using a basic formula, while the table and taxonomy modes measure the similarity of symbolic attributes. Specifically, the table mode is based on the user-defined concept similarity degree, while the taxonomy mode is based on the hierarchical structure of the attribute value.
The attributes of accident cases were divided into numerical attributes and symbolic attributes. The "standard mode" was adapted to measure the similarity of partial numerical attributes, including "time delay," "train delay," "train failure," and "clean off the train," as shown in Figure 8. The "table mode" was adopted to measure the similarity of the symbolic attributes. The core algorithm of the "standard mode" and "table mode" was the similarity measure method of the numerical attributes and symbolic attributes given in Section 5.1. Then, as shown in Figure 9, the weights of each attribute were determined using the Delphi method based on the importance of attributes when describing the accident knowledge. Following the development of the case database and the definition of the attribute weights, the query sentences were inputted into the system. The search engine then computed the similarity between the query sentences and all cases stored in the case database.

Testing
Given that the number of accidents caused by signal system failures was considerably larger than the number of accidents due to other causes, here, the fault characteristics of the signal system failure were chosen as the index to retrieve similar cases. Assume that a metro has experienced the following problem: Then, as shown in Figure 9, the weights of each attribute were determined using the Delphi method based on the importance of attributes when describing the accident knowledge. Following the development of the case database and the definition of the attribute weights, the query sentences were inputted into the system. The search engine then computed the similarity between the query sentences and all cases stored in the case database. Then, as shown in Figure 9, the weights of each attribute were determined using the Delphi method based on the importance of attributes when describing the accident knowledge. Following the development of the case database and the definition of the attribute weights, the query sentences were inputted into the system. The search engine then computed the similarity between the query sentences and all cases stored in the case database.

Testing
Given that the number of accidents caused by signal system failures was considerably larger than the number of accidents due to other causes, here, the fault characteristics of the signal system failure were chosen as the index to retrieve similar cases. Assume that a metro has experienced the following problem:

Testing
Given that the number of accidents caused by signal system failures was considerably larger than the number of accidents due to other causes, here, the fault characteristics of the signal system failure were chosen as the index to retrieve similar cases. Assume that a metro has experienced the following problem: The signals of the metro are cut out during the metro operation stage, which may cause the loss of the metro position and then become rear-ended. As reported in the SMC (system management center) equipment, the metro integrity was lost.
"Cutout" was defined as the query sentence of the above case. Figure 10 presents the similar cases retrieved. Their similarity to the input query was given. The case attributes are shown in descending order based on their similarity to the corresponding attributes of the input case. For providing reliable support to managers, we have considered retrieved cases whose similarity scores were more than 0.7. In Figure 10, facing the signal failure accident with "cutout," managers can give the following orders based on the retrieval of similar cases: the driver should change the driving mode into PM (protection of the artificial driving mode) and the train should go to the entry, where the signal malfunction is resolved and the operation is resumed. According to the historical cases, the delay time caused by the above accidents is roughly four minutes, and improving the information system was adopted as the corrective measure. The signals of the metro are cut out during the metro operation stage, which may cause the loss of the metro position and then become rear-ended. As reported in the SMC (system management center) equipment, the metro integrity was lost.
"Cutout" was defined as the query sentence of the above case. Figure 10 presents the similar cases retrieved. Their similarity to the input query was given. The case attributes are shown in descending order based on their similarity to the corresponding attributes of the input case. For providing reliable support to managers, we have considered retrieved cases whose similarity scores were more than 0.7. In Figure 10, facing the signal failure accident with "cutout," managers can give the following orders based on the retrieval of similar cases: the driver should change the driving mode into PM (protection of the artificial driving mode) and the train should go to the entry, where the signal malfunction is resolved and the operation is resumed. According to the historical cases, the delay time caused by the above accidents is roughly four minutes, and improving the information system was adopted as the corrective measure.

Evaluation
The evaluation was performed to test the ability of the proposed method to accurately retrieve a suitable case from the case database. Metrics like "precision" and "recall" are usually used in the evaluation of performance in CBR research [50]. "Precision" and "recall" were computed by utilizing Equations (8) and (9): Recall = number of true positives number of true positives + number of false negatives , where "true positive (TP)" represents a retrieved case that provided a feasible solution for the query case, "false positive (FP)" means that a solution of the recommended case was not reliable for the query case, and "false negative (FN)" reflects that the stored cases with suitable solutions for the query case were not retrieved from the library. In this study, the "precision" metric is more important than "recall." "Precision" is the portion of retrieved cases that were deemed by the system to provide suitable solutions for the query cases, while "recall" is the portion of all cases in the case library with feasible solutions, including those that were not retrieved [8]. In the context of a metro accident response, the accurate retrieval of cases and avoiding the provision of unfeasible cases (high precision) is more important than retrieving all the relevant cases (high recall). The retrieved erroneous cases may affect the decision-making in the emergent response of metro accidents during operations and further lead to serious consequences, such as casualties or heavy economic loss. Therefore, "precision" was selected as the evaluation metric since "precision" and "recall" are often negatively correlated. Additionally, considering the circumstance that in the real working environment, managers expect to get the required information within a limited amount of time, the case with the highest similarity

Evaluation
The evaluation was performed to test the ability of the proposed method to accurately retrieve a suitable case from the case database. Metrics like "precision" and "recall" are usually used in the evaluation of performance in CBR research [50]. "Precision" and "recall" were computed by utilizing Equations (8) and (9) where "true positive (TP)" represents a retrieved case that provided a feasible solution for the query case, "false positive (FP)" means that a solution of the recommended case was not reliable for the query case, and "false negative (FN)" reflects that the stored cases with suitable solutions for the query case were not retrieved from the library. In this study, the "precision" metric is more important than "recall." "Precision" is the portion of retrieved cases that were deemed by the system to provide suitable solutions for the query cases, while "recall" is the portion of all cases in the case library with feasible solutions, including those that were not retrieved [8]. In the context of a metro accident response, the accurate retrieval of cases and avoiding the provision of unfeasible cases (high precision) is more important than retrieving all the relevant cases (high recall). The retrieved erroneous cases may affect the decision-making in the emergent response of metro accidents during operations and further lead to serious consequences, such as casualties or heavy economic loss. Therefore, "precision" was selected as the evaluation metric since "precision" and "recall" are often negatively correlated. Additionally, considering the circumstance that in the real working environment, managers expect to get the required information within a limited amount of time, the case with the highest similarity to the input query has the most value to the users. Thus, in the evaluation process, we verified the suitability of the solution included in the retrieved case with the highest similarity to each given query through expert reviews. A set of keywords (e.g., "cutout," "single VOBC failure," "TMS failure," "ATC equipment failure") that are relevant to metro operational accidents were defined for making up 10 testing queries. Retrieved cases with a similarity of more than 0.7 were selected as the retrieved results in this study, which were then evaluated via expert reviews. Then, five experts in a metro company in Wuhan (filed engineers and metro dispatchers with an average of 10 years of experience) were invited to evaluate the performance of the retrieval results because they were familiar with metro accident responses. The experts would discuss the retrieved results for each testing query with each other to judge whether the retrieved cases belonged to "TP" or "FP" and verify the feasibility of the highest similarity case. Table 5 shows the result of the expert review on the 10 testing queries. The mean precision of the proposed method was 91%. Additionally, the recommended solutions in the most similar case were feasible to solve the problems in the query case, which could help the manager to undertake better decision-making in a limited amount of time in the accident. The accuracy of the highest similarity case verified the feasibility of the proposed method in practice.

Inference of Accident Response Rules in RBR
CBR has some advantages, such as incremental learning, easy acquisition, and easy maintenance. However, it may fail when no relevant case is available, especially in metro accidents in which the complete records are rare. Thus, to overcome shortages of the CBR method due to the lack of relevant cases, rule-based reasoning (RBR) is added as the complementary method in the decision-making process in this paper. The integration of CBR and RBR is useful in the decision-making process [51]. Cases reveal the knowledge or experiences obtained from specific situations and rules represent the general knowledge of a specific domain from documents, such as regulations and emergency plans.
In a realistic accident response process, managers need to identify the accident level first based on fault characteristics of the accident, and then take appropriate measures under the guidance of various documents. Figure 11 shows an example of an accident response process in the case of large passenger flow. Thus, in this study, RBR was used to infer the accident level and response measures. Semantic Web Rule Language (SWRL) was used to represent rules for reasoning the response measures. An SWRL rule contains an antecedent part and a consequent part, both of which were written in Web Ontology Language (OWL) classes, properties, individuals, and data values. As an example, the rules of a large passenger flow accident are presented as follows. Table 6 shows the response rules under the circumstance of level 3 passenger flow. As shown in Table 6, different staff in a specified position should take different measures to minimize the impact of accidents.  Figure 11. The handling process of a large passenger flow accident.
Protégé was used to verify the feasibility of RBR. In Protégé, by iterating the SWRL rules listed in Table 6, the response measures to be taken were inferred. As seen in Figure 12, once the passenger flow level increased to level 3, setting the passenger flow divider to guide the passenger flow should be taken by the station attendant. Therefore, through encoding the SWRL rules, RBR can be used to support decision-making by deciding the response measures to the metro accident, which are recorded in documents like emergency plans.

Conclusions
In this study, an ontological framework integrating CBR and NLP was proposed to facilitate knowledge retrieval and reasoning, which can be used to support decision-making in metro accident responses. First, an ontological model was developed to formalize the representation of domain knowledge, such as the regulations, emergency plans, and past accident records. Based on the ontology model, NLP techniques were adopted to automatically annotate accident cases into a set of keywords, which could enhance the efficiency of further case retrievals. Then, RBR was used as a complement to CBR because CBR may fail when there are few or no relevant cases. The combination of CBR and RBR can avoid this problem in the metro accident response by retrieving the specific Protégé was used to verify the feasibility of RBR. In Protégé, by iterating the SWRL rules listed in Table 6, the response measures to be taken were inferred. As seen in Figure 12, once the passenger flow level increased to level 3, setting the passenger flow divider to guide the passenger flow should be taken by the station attendant. Therefore, through encoding the SWRL rules, RBR can be used to support decision-making by deciding the response measures to the metro accident, which are recorded in documents like emergency plans.  Figure 11. The handling process of a large passenger flow accident.
Protégé was used to verify the feasibility of RBR. In Protégé, by iterating the SWRL rules listed in Table 6, the response measures to be taken were inferred. As seen in Figure 12, once the passenger flow level increased to level 3, setting the passenger flow divider to guide the passenger flow should be taken by the station attendant. Therefore, through encoding the SWRL rules, RBR can be used to support decision-making by deciding the response measures to the metro accident, which are recorded in documents like emergency plans.

Conclusions
In this study, an ontological framework integrating CBR and NLP was proposed to facilitate knowledge retrieval and reasoning, which can be used to support decision-making in metro accident responses. First, an ontological model was developed to formalize the representation of domain knowledge, such as the regulations, emergency plans, and past accident records. Based on the ontology model, NLP techniques were adopted to automatically annotate accident cases into a set of keywords, which could enhance the efficiency of further case retrievals. Then, RBR was used as a complement to CBR because CBR may fail when there are few or no relevant cases. The combination of CBR and RBR can avoid this problem in the metro accident response by retrieving the specific

Conclusions
In this study, an ontological framework integrating CBR and NLP was proposed to facilitate knowledge retrieval and reasoning, which can be used to support decision-making in metro accident responses. First, an ontological model was developed to formalize the representation of domain knowledge, such as the regulations, emergency plans, and past accident records. Based on the ontology model, NLP techniques were adopted to automatically annotate accident cases into a set of keywords, which could enhance the efficiency of further case retrievals. Then, RBR was used as a complement to CBR because CBR may fail when there are few or no relevant cases. The combination of CBR and RBR can avoid this problem in the metro accident response by retrieving the specific knowledge found in a similar case or general knowledge found in regulations. SWRL rules were used in this study to represent the response measures in the regulations. Finally, the proposed framework was tested and evaluated through a case study via expert reviews. A total of 120 metro accidents in the operational phase were used to build a case library. Then, five experts in a metro operating company were invited to evaluate the performance of the proposed method in 10 testing queries based on the "precision" metric. Considering the demand for quick responses, users tend to adopt solutions in the case with the highest similarity to the input query. Thus, the accuracy of cases with the highest similarity was also verified in 10 testing queries. The results show the good performance of the proposed method.
However, this study still has some limitations that may be alleviated in future studies. The limitations are summarized as follows.
First, the ontology development is time-consuming and considerable initial work needs to be done. A significant manual effort is required to update the ontology model if the metro accident knowledge base is updated.
Second, the proposed framework is limited in terms of case retrieval within a small case database (120 accident records). The main purpose of this study was to develop a general framework based on ontological CBR and NLP for supporting decision-making when facing emergent accidents rather than a complete accident case database. However, the small database would affect the retrieval results. To apply the proposed framework in practice, the size of the database should be extended in the future. Moreover, with the increasing number of stored accidents in the database, the pattern and trends of historical accidents can be identified through statistical methods, which can be used to make possible predictions of future accidents.
Third, the proposed framework has not been validated in practice. To promote its application, a system with a convenient interface should be further developed.