A Novel Hybrid Genetic-Whale Optimization Model for Ontology Learning from Arabic Text

: Ontologies are used to model knowledge in several domains of interest, such as the biomedical domain. Conceptualization is the basic task for ontology building. Concepts are identiﬁed, and then they are linked through their semantic relationships. Recently, ontologies have constituted a crucial part of modern semantic webs because they can convert a web of documents into a web of things. Although ontology learning generally occupies a large space in computer science, Arabic ontology learning, in particular, is underdeveloped due to the Arabic language’s nature as well as the profundity required in this domain. The previously published research on Arabic ontology learning from text falls into three categories: developing manually hand-crafted rules, using ordinary supervised / unsupervised machine learning algorithms, or a hybrid of these two approaches. The model proposed in this work contributes to Arabic ontology learning in two ways. First, a text mining algorithm is proposed for extracting concepts and their semantic relations from text documents. The algorithm calculates the concept frequency weights using the term frequency weights. Then, it calculates the weights of concept similarity using the information of the ontology structure, involving (1) the concept’s path distance, (2) the concept’s distribution layer, and (3) the mutual parent concept’s distribution layer. Then, feature mapping is performed by assigning the concepts’ similarities to the concept features. Second, a hybrid genetic-whale optimization algorithm was proposed to optimize ontology learning from Arabic text. The operator of the G-WOA is a hybrid operator integrating GA’s mutation, crossover, and selection processes with the WOA’s processes (encircling prey, attacking of bubble-net, and searching for prey) to fulﬁll the balance between both exploitation and exploration, and to ﬁnd the solutions that exhibit the highest ﬁtness. For evaluating the performance of the ontology learning approach, extensive comparisons are conducted using di ﬀ erent Arabic corpora and bio-inspired optimization algorithms. Furthermore, two publicly available non-Arabic corpora are used to compare the e ﬃ ciency of the proposed approach with those of other languages. The results reveal that the proposed genetic-whale optimization algorithm outperforms the other compared algorithms across all the Arabic corpora in terms of precision, recall, and F-score measures. Moreover, the proposed approach outperforms the state-of-the-art methods of ontology learning from Arabic and non-Arabic texts in terms of these three measures.


Introduction
In recent times, the internet has become people's principle source of information.A huge quantity of web pages and databases is accessed every day.The instant growth in the quantity of information accessed via the Internet has caused difficulty and frustration for those trying to find a particular piece of information.Likewise, the various kinds of information resources that exist on the Internet constitute an enormous quantity of information in the form of web pages, e-libraries, blogs, e-mails, e-documents, and news articles, all containing huge amounts of data [1].Such information is unstructured or semi-structured, which means that the knowledge discovery process is challenging.To deal with this challenge, the semantic web was invented as an extension of the ordinary web [2].
Ontology is a method for extending web syntactic interoperability to semantic interoperability.Ontologies are exploited to represent huge data in such a way that allows machines to interpret its meaning, allowing it to be reused and shared [3].They are formal and explicit specifications of concepts and relations [4] and play a crucial role in improving natural language processing (NLP) task performance, such as information extraction and information retrieval.Ontologies are usually restricted to a particular domain of interest.The preliminary identification of ontology is expressed as Characterization of Conceptualization.The ontology learning from texts is "The acquisition of a domain model from textual corpus" [5].
Building ontologies can be accomplished manually, automatically, or in a semi-automatic way.However, the manual building of ontologies has the drawbacks of being time-consuming, expensive, and error-prone [6].Furthermore, it demands the cooperation of ontology engineers and domain experts.In order to avoid these shortcomings, ontology learning has evolved to automate or semi-automate the construction of ontologies.Ontology learning includes knowledge extraction through two principle tasks: concepts extraction (which constitute the ontology) and extracting the semantic relations that link them [7][8][9].
Despite the Arabic language's importance as the sixth most spoken language in the world [2] and the tremendous growth of Arabic content via the web in recent years, it has been given little attention in the ontology learning field [10][11][12].Several contributions are available on domain ontologies in English [13][14][15] and other languages.However, Arabic is not commonly considered by specialists in this field.Furthermore, the automatic extraction of semantic relationships from Arabic corpora has not been extensively investigated in comparison to other languages such as English.The majority of attempts to construct Arabic ontology is still implemented manually [2,16].Manually developing conceptual ontologies is not only a time-consuming but also a labor-intensive job.Furthermore, extra challenges are encountered when extracting knowledge from Arabic texts due to the nature of the Arabic language, the words' semantic vagueness, and the lack of tools and resources which support Arabic.Consequently, the Arabic language suffers from a lack of ontologies and applications in the semantic web [17,18].
In summary, only a few studies have considered automatic ontology learning from Arabic text [4,9,12,[19][20][21][22][23].These works fall into one of the following three categories: handcrafted rule-based methods [12,20,21], machine learning methods [9,19,20,22], and hybrid rule-based/machine learning methods [4,23].The studies that have introduced rule-based approaches for ontology learning are based on extracting the semantic relationships between Arabic concepts or Arabic named entities, and utilize the same technique, which can identify linguistic patterns from a given corpus.These patterns are then converted to rules and transducers.The drawbacks of the rule-based methods include being time-consuming and having the requirement to fully cover all rules which may represent any kind of relationship.The works that have proposed machine-learning approaches for ontology learning are based on conventional classification algorithms that categorize Arabic relations into corresponding types, but do not provide any solutions to overcome the drawbacks of these classification algorithms, such as their low performance when analyzing large textual datasets and high-dimension data.Some works have attempted to overcome the shortcomings of the two previous methods by integrating inference rules and machine learning algorithms into hybrid approaches.Although this hybridization has somewhat optimized the overall performance, more advanced hybrid approaches to optimize Arabic ontology learning are still required.
By comparison to the other languages, several studies have been conducted for learning ontology from English text [24][25][26], which has achieved the largest number of contributions among the other languages.Some of these studies presented rule-based approaches [24], and the others proposed machine-learning-based approaches [25,26].In [24], the authors presented a rule-based approach for learning the English ontology in which the inductive logic programming was used to obtain ontology mapping.This method described the ontology in OWL format and then interpreted it into first-order logic.Thereafter, it generated generalized logical rules depending on background knowledge, just as mappings do.In [25], an exemplar-based algorithm was introduced to link the text to semantically similar classes in an ontology built for the domain of chronic pain medicine.In [26], a machine learning approach based upon a neural network was presented to learn ontology through the encoder-decoder configuration.Accordingly, the natural language definitions were translated into Description Logics formulae through syntactic transformation.These methods of building ontologies are domain-specific.Therefore, they are not applicable with the Arabic language and do not support the Arabic texts.
Recently, the hybrid approaches of different bio-inspired optimization algorithms [27][28][29] demonstrated competitive performances in different applications of computer science, where two or more algorithms of the following are used as hybrid to optimize the problem in the domain of interest: the genetic algorithm (GA) [30][31][32], social spider optimization [33,34], ant colony optimization (ACO) [35,36], and whale optimization algorithm (WOA) [37,38].These methods have several merits, including having a small parameter set, simple frameworks, and capability to avoid the shortcoming of being trapped in the local optima.Thus, they are suitable for several real applications and have the robustness to solve many problems of global optimization without the need to change the original algorithm structure.
In between these algorithms, the WOA was introduced in [39] for solving the global optimization problem through emulating the humpback whales behavior.These humpback whales are well known of a hunting method, namely, bubble-net feeding [39].This behavior operates in three phases, including coral loop, lobtail, and capture loop [39].The extra information on this behavior can be found in [40].In comparison to the other bio-inspired optimization algorithms, such as Particle Swarm Optimization (PSO), the WOA algorithm has a good exploration capability of the search space [37].However, it suffers from poor exploitation and the probability to be trapped into local optima.
In addition, GA is another heuristic algorithm for combinatorial optimization [31].In comparison to the other similar algorithms like Tabu Search (TS) [41,42] and simulated annealing (SA) [43], we can find that all of them are applied for several combinatorial optimization problems.Furthermore, they also have different properties.First, a great computational cost is required by GA to find the optimal solution.Secondly, the best solution quality provided by the GA is superior to the SA and is comparable to the TS.Moreover, the domain-specific knowledge can be incorporated by the GA in all combinatorial or optimization phases to dictate the strategy of search, in contrary to TS and SA, which lack this feature.Therefore, based on the proven superiority of the GA and WOA in many applications [30][31][32]37,38] and to overcome the drawbacks of the ordinary WOA, this work further demonstrates the robustness of the proposed hybrid genetic-whale optimization algorithm (G-WOA) to optimize ontology learning from Arabic texts, in which the GA algorithm is used to optimize the exploitation capability of the ordinary WOA algorithm and solve its premature convergence issue by combining the genetic operations of GA into the WOA.
This paper contributes to the state-of-the-art of Arabic ontology learning through the following: • Firstly, a text mining algorithm is proposed particularly for extracting the concepts and their semantic relations from the Arabic documents.The extracted set of concepts with the semantic relations constitutes the structure of the ontology.In this regard, the algorithm operates on the Arabic documents by calculating the concept frequency weights depending on the term frequency weights.Thereafter, it calculates the weights of concept similarity using the information-driven from the ontology structure involving the concept's path distance, the concept's distribution layer, and the mutual parent concept's distribution layer.Eventually, it performs the mapping of features by assigning the concept similarity to the concept features.Unlike the ordinary text mining algorithms [9,10], this property is crucial because merging the concept frequency weights with the concept similarity weights supports the detection of Arabic semantic information and optimizes the ontology learning.

•
Secondly, this is the first study to propose bio-inspired algorithms for optimization of Arabic ontology learning, in which a hybrid G-WOA algorithm is proposed in this context, to optimize the Arabic ontology learning from the raw text, by optimizing the exploration-exploitation trade-off.
It can benefit from a priori knowledge (initial concept set obtained using the text mining algorithm) to create innovative solutions for the best concept/relation set that can constitute the ontology.

•
Thirdly, investigating the comparable performance between the proposed G-WOA and five other bio-inspired optimization algorithms [32,39,[44][45][46], when learning ontology from Arabic text, where its solutions are also compared to those obtained by the other algorithms, across different Arabic corpora.To the best of our knowledge, the proposed and compared bio-inspired algorithms have not been investigated in Arabic or non-Arabic ontology learning yet.

•
Fourthly, the proposed ontology learning approach is applicable with the other languages, where it can be applied to extract the optimal ontology structure from the non-Arabic texts.

Literature Review
Due to the rapid surge of textual data in recent years, several studies have concentrated on how to create taxonomy from labeled data [47][48][49][50].In this context, there were many attempts to deal with multi-label learning/classification problems.In [47], the authors concentrated on how to learn classifiers the balanced label through label representation, using a proposed algorithm, namely, Parabel.This algorithm could learn the balanced and deep trees.The trees learned using this algorithm were prone to prediction performance degradation because of forceful aggregation for labels of head and tail into longer decision paths and generic partitions.In [48], the authors introduced a shallow tree algorithm, namely Bonsai, which can deal with the label space diversity and scales to a large number of labels.The Bonsai algorithm was able to treat with diversity in the process of partitioning by allowing a larger fan-out at every node.
In [49,50], the authors used the hierarchical and flat classification strategies with the large-scale taxonomies, relying on error generalization bounds for the multiclass hierarchical classifiers.The main goal of some of these works was the large-scale classification of data into a large number of classes, while the others concentrated on how to learn the classifier the given trees.In contrary to these works, the main goal of this paper was to introduce an approach for extracting the optimal structure that constitutes the ontology from the raw textual data by employing the text mining and bio-inspired optimization techniques.

Literature Review on Arabic Text Mining
Although several works have been devoted to text mining from English and Latin languages [51,52], little attention has been paid to mining the Arabic texts.This is mainly because of the Arabic structural complexity and the presence of several Arabic dialects.Table 1 presents state-of-the-art information on Arabic text mining [53][54][55][56][57][58][59].The majority of works in this context have concentrated on using the Vector Space Model [57], Latent Semantic Indexing [56], and Term Frequency (TF)/Inverse Document Frequency (TF/IDF) [54,55].However, these algorithms still suffer from two shortcomings: the dimension curse and the semantic information lack.Therefore, in this study, we proposed a specific text mining algorithm that begins with the conceptualization stage to extract the initial concept set constituting the ontology and captures their semantic information.Then, applying named entity recognition as well as data analysis rules on the classified words to generate final report.The lexical features along with Twitter-specific features were employed in classification.
A private database of collected Arabic tweets 96.49%

Literature Review on Arabic Ontology Learning
Ontology learning from text is a very important area in computer science.Published works on ontology learning from Arabic texts are still rare.As previously mentioned, the contributions of the state-of-the-art Arabic ontology learning from texts can be distinguished into one of the following categories.The works under these categories were examined in the following section and using Table 2.
The rule-based approaches [12,20,21,[60][61][62] rely on patterns comprising all the possibly-correlated linguistic sequences commonly executed in a form of finite-state transducers or even regular expressions.Despite those methods being beneficial for a limited domain, besides their better analysis quality, they cannot act in a good way, in particular, the creation of the manually hand-crafted patterns is so laborious with regard to effort and time.Hence, through the applications of such approaches, it is difficult to manipulate enormous amounts of data.
For automating the relations extraction, some studies [9,19,22,63,64] have used machine learning algorithms involving (1) unsupervised, (2) semi-supervised, and (3) supervised learning.For the unsupervised methods, the popular approach takes clusters from the patterns of the same relationship and then generalizes them.However, the semantic representations of relational patterns, in addition to the scalability to big data, make these methods face a challenge in reference to the reliability of the obtained patterns [55].Although these algorithms can manipulate large quantities of data, the conversion of the output relations to ontologies represents a labor-intensive task.The authors developed a model of pattern recognizer that targets to signal the existence of cause-effect information in sentences from non-specific domain texts.The model incorporated 700 linguistic patterns to distinguish the sentence parts representing the cause, besides to these representing the effect.To construct patterns, various sets of the syntactic features were considered through analyzing the untagged corpus.
[62] 2017 The authors introduced a rule-based system namely, ASRextractor, to extract and annotate semantic relations relating Arabic named entities.The semantic relation extraction was based upon an annotated corpus of Arabic Wikipedia.The corpus incorporated 18 types of semantic relations like synonymy and origin.
[20] 2018 A statistical parsing method was adopted to estimate the key-phrase/keyword from the Arabic dataset.The extracted dataset was converted to an OWL ontology format.Then, the mapping rules were used to link the components of ontology to corresponding keywords.
[21] 2018 A set of rules/conjunctive patterns were defined for extracting the semantic relations of the Quranic Arabic according to a deep study for Arabic grammar, POS tagging, as well as the morphology features appears in the corpus of Quranic Arabic.

Machine learning-based approaches [63] 2009
With the objective of semantic relation extraction, the authors amalgamated two supervised methods, to be specific, the basic Decision Tree as well as Decision Lists-based Algorithm.They estimated Three semantic relations (i.e., location, social and role) among named entities (NEs).
[22] 2009 On the basis of the dependency graph producing by syntactic analysis, the authors adopted a learning pattern algorithm, denoted (LP) 2 for generating annotation rules.
[19] 2013 A rule mining approach has been proposed to be applied on an Arabic corpus using lexical, numerical, and semantic features.After the learning features were extracted from the annotated instances, a set of rules were generated automatically by three learning algorithms, namely, Apriori, decision tree algorithm C4.5, and Tertius.
[9] 2017 A statistical algorithm was used to extract the simple and complex terms, namely, "the repeated segments algorithm".For selecting segments that have sufficient weight, the authors used the Weighting Term Frequency-Inverse Document Frequency algorithm (WTF-IDF).Further, a learning approach was proposed based upon the analysis of examples for learning extraction markers to detect new pairs of relations.
[64] 2018 Genetic algorithm (GA) was proposed to minimize the computation time needed to search out the informative and appropriate Arabic text features needed for classification.
The SVM was used as machine learning algorithm that evaluates the accuracy of the Arabic named entities recognition.
Hybrid approaches [65] 2013 Three methodologies were encompassed: kernel method, co-occurrence, and later rule-based.These methods were utilized for extracting simple and complicated relations regard the biomedical domain.For mapping the data into a feature space of high-dimensionality, Kernel-based algorithms have been used.
[23] 2014 The authors proposed a hybrid rule-based/machine learning approach and a manual technique for extracting semantic relations between pairs on named entities.
[4] 2017 A rules patterns set was defined from compound concepts for inducing of general relations.It utilized a gamification mechanism to specify relations based on prepositions semantics.The Formal Concept Analysis/Relational Concept Analysis approaches were employed for modeling the hierarchical as well as transversal relations of concepts.
To encounter the drawbacks of the unsupervised approaches, the studies investigated the semi-supervised methods or bootstrapping techniques that need seeding-points sets rather than training sets.The seeds are linguistic patterns or even relation-instances which are applied in an iterative way for acquisition of more basic elements until all objective relations are found.The shortcoming of the bootstrapping approaches deeply relies on the chosen initial seeds, which might reflect precisely the information of the corpus.On the other side, the extraction caliber is low.The supervised techniques [63] are the last category under the machine learning-based approaches, which depends on a completely labeled-corpus.Thus, extracting the relations is regarded as a matter of classification, according to the supervised techniques.Amongst them, we mention conditional random fields, support vector machine (SVM) [64], decision tree [19], in addition to Maximum Entropy (MaxEnt).These algorithms give a low performance in case of the high-dimensional corpora.
On the other side, the researchers have successfully addressed some of the previously discussed challenges such as the long sentences of Arabic and the non-fixed location of semantic relations in sentences.Therefore, they have integrated the rule-based method with machine learning to get hybrid approaches [4,23,65].These hybrid methods have demonstrated enhanced performance in comparison to the single rule-based or the machine learning-based approaches.Generally, recent literature demonstrates a huge interest in the hybrid artificial intelligence-based models to solve problems in several domains.In [27], a hybrid algorithm integrates the merits of GA, including the great global converging ratio together with ACO to introduce solutions for the supplier selection problems.In [28], a genetic-ant colony optimization model was proposed to overcome the word sense disambiguation that represents a serious natural language processing problem.Therefore, it is important to propose hybrid intelligent approaches to introduce numerous choices for unorthodox handling of Arabic ontology learning problem, which comprise vagueness, uncertainty, and high dimensionality of data.
In this context, these hybrid bio-inspired optimization algorithms can present innovative solutions to support the Arabic language.They can overcome the key shortcoming of existing methods for Arabic ontology learning as they can deal with the high-dimensional or sparse data that makes it hard to capture the relevant information, which helps to learn ontology via dimensionality reduction, depending on selecting only the optimal concepts and semantic relations that contribute to the ontology structure and ignoring the non-related ones.Therefore, this paper contributes to the state-of-the-art on Arabic ontology learning with a hybrid model based on GA and WOA.This model was experimented to ontology learning using a number of the publicly available Arabic and non-Arabic corpora.

Genetic Algorithm
The GAs [30][31][32] are random-search algorithms that are inspired by natural genetic mechanism and biological natural selection, which belong to the computational intelligence algorithms.The GA emulates the reproduction, crossover, and mutation in the process of genetic mechanism and natural selection.In the GAs, the individual is the optimized solution of the problem, namely the chromosome or genetic string.The GA can be expressed as an eight tuple: GA = C, Fitness, P, Pop Size , L, α, β, S , where C is the encoding method for the individuals within population, Fitness is a fitness function for evaluating individuals, P is the initial solution, Pop Size is the population size, L, α and β indicate the operators of selection, crossover and mutation, respectively, and S defines the GA termination condition.A GA begins with the initial population of chromosomes or strings and then produces successive populations of chromosomes.The basic GA comprises the following three operations:

•
Reproduction.The reproduction means keeping chromosomes without changes and transferring them to the next generation.Inputs and outputs of this procedure are the same chromosomes.

•
Crossover.This process concatenates two chromosomes to produce a new two ones through switching genes.On this basis, the input for this step is two chromosomes, whereas the output is two different ones.

•
Mutation.This process reverses randomly one gene value of a chromosome.Thus, the input chromosome is completely different from the output one.
When determining not to conduct crossover, the chromosomes of parents are duplicated to the off-spring without change.Evolution speed of genetic search is altered by varying the probability of crossover.Practically, the crossover value is close to 1. Contrarily, the mutation ratio is usually fairly small.

Whale Optimization Algorithm
The WOA was proposed in [39].It is inspired by the humpback whales' behavior.In comparison to the other bio-inspired algorithms, the WOA improves the candidate solutions in each step of optimization.In this context, the emulation of bubble-nets was implemented using a spiral movement.This procedure imitates the helix-shaped movement of the actual humpback whales.

Encircling Prey
Assume that a whale c(i) has a position which is updated through moving it simultaneously in a spiral around its prey c best .Mathematically, this procedure is expressed as follows: where S = c best − c(i) refers to the distance between c(i) and c best at iteration i, r ∈ [−1, 1] represents a random number, and h is a constant variable defining a logarithmic spiral shape.The positions of the wales are updated by the encircling behavior based upon c best (i) as follows: K and A represent coefficient vectors and are defined using where m denotes a random vector and e is decreased linearly from 2 till 0 along iterations i, then the value of o is computed using

Bubble-Net Attacking Method
For the bubble-net attacking, the whales are able to swim simultaneously around the prey over a spiral-shaped path and throughout a shrinking circle.Equation (7) defines this behavior: where m ∈ [0, 1] refers to the probability of choosing the mechanism of swimming on all the prey's sides (weather spiral model-based swimming or shrinking encircling-based swimming).Nevertheless, humpback whales search for prey in a random manner.

Searching for Prey
In reality, humpback whales swim randomly so that they search for prey.The positions of the whales are updated using a randomly chosen whale c rand (i) as given below: Eventually, based upon the value of e (decreases from 2 till 0), K, A and the probability m, the position of every ith whale is updated.If m > 0.5, then go to Equation (1).Otherwise, go to either Equations ( 2) and (3) or Equations ( 8) and ( 9) depending on the value of |K|.This procedure is repeated until the stopping condition.

Arabic Ontology Learning
Ontology learning is one of the most important issues in Arabic language processing.In the literature, to construct the ontology of any conceptual domain, this is based on three dominant linguistic theories:

The Semantic Field Linguistic Theory
The semantic field linguistic theory [17], in which the word meaning is deemed within a specific perspective of the world, was presented by Jost Trier [5].Accordingly, it is determined by its relationship to the words within the field/domain (conceptual area).It presumes that each word is constructed inside semantic fields based upon a primitive feature set.Moreover, the position of the word within the field determines its meaning, and the relations it creates with the remaining words in this field.Utilizing componential analysis, what is meant by a word is established in reference to some specified atomic components or decompositions representing the features that distinguish a considered word.Such features form the base for structuring a particular semantic domain.The individual word meaning can be identified as an integration of the representative features.Such formulae are indicated as componential definitions for the semantic units and denoting formalized dictionary definitions.

The SEMANTIC analysis Linguistic Theory
This is a strategy to extract and represent the meaning of word contextual usage by applying statistical methods to the textual corpus [66].The main idea is to aggregate words into contexts within which a specified word is or does not belong.This depends on a set of constraints that decides the similarities of word meanings and sets words to each other.

The Semantic Relations Theory
Underlying semantic relations for Arabic text show a great deal of variety [67].The three semantic relationships considered in the current work can be explained with the following examples of biomedical concepts from our corpus:

•
Synonymy.This relationship type aims concepts that hold nearly similar meanings.For instance, the concepts inspiration and inhalation are synonyms.
• Antonyms.This relationship aims concepts that demonstrate opposite meanings, i.e., antonyms, like malignant, and benign.
• Inclusion.This type of relation means that one entity-type comprises sub entity-types.For example, the concept pulmonary valve with the concept heart, can indicate a part-to-whole or Is-a relationship.Figure 1 presents an example of some biomedical knowledge concepts available in our corpus which are linked with an Is-a relationship.


Inclusion.This type of relation means that one entity-type comprises sub entity-types.For example, the concept ‫صمام‬ ‫رئوي‬ pulmonary valve with the concept ‫ق‬ ‫لب‬ heart, can indicate a part-to-whole or Is-a relationship.Figure 1 presents an example of some biomedical knowledge concepts available in our corpus which are linked with an Is-a relationship.

Proposed Model for Arabic Ontology Learning
This section introduces the proposed model for ontology learning from Arabic text.The proposed model integrates: (1) a proposed text mining algorithm for extracting the concepts and the semantic relations which they are linked with, from the text documents, and (2) a proposed hybrid genetic-whale optimization algorithm to select the optimal concept/relationship set that constitute the Arabic ontology.

Pre-Processing
Pre-processing of Arabic texts in the three datasets investigated in this study is performed in two steps:
• Stemming.This task leaves out the primitive form of a word.Thus, words or terms that share identical root but differ in their surface-forms due to their affixes can be determined.Such a procedure encompasses eliminating two things: a prefix, like ' ', at the start of words and as suffix such as ' 'at the end of words.An instance of eliminating a prefix and a suffix is the input word '' 'cancerous' which is stemmed to ' ' 'cancer'.

Proposed Text Mining Algorithm
The algorithm extracts concepts and their semantic relations that constitute the ontology from each document of Arabic text, in three steps: Term weighting, concept similarity weights, and feature mapping.

Term Weighting
The weight in text mining is a well-known statistical measure for evaluating how important a term (word) is for a textual document in a corpus.Thus, we assigned a weight to each term of a document.This procedure is called term weighting.Thereby, every document is expressed in a vector form relying on the terms encompassed inside.Formally speaking, the vector that characterizes the document will be in the following format: where TW a refers to the weighting of the term that has the number m in the doc document of index n, C represents the term set, and |C| denotes the cardinality of C.
To obtain a vector involves the terms of C, the TF-IDF is utilized as weighting.Assume that the term frequency TF a expresses the occurrences number of T a within the document, and the document frequency DF a is the document number in which the given term T a can be seen at least once.Thus, we can compute the inverse document frequency IDF a , as illustrated in Equation ( 11) using DF a [68]: where |DOC| denotes the number of documents assigned as a training set, and TW a is computed by Equation ( 12): Subsequently, the irrelevant and redundant features are eliminated from the text document, thus, we can represent the document set as a "document-term" matrix as follows: Depending on the resulting weights for feature frequency, the algorithm maps the document's terms to corresponding concepts.As illustrated in Algorithm 1, TW and CW are two matrices to the same document, and S T and C T indicate the sets of terms and concepts, respectively.The algorithm reveals that through mapping, the document's terms to correlative concepts, the document's vector of terms will be converted into a vector of concepts.Thus, the algorithm will replace the document set of Equation ( 13) by the document-concept matrix in Equation ( 14): where CW (l, m) denotes the frequency weight for "concept 1" in document m, a represents the documents number, and A is the concepts number.While (S T Φ)
For For B = 1 to count (C T ) 24.
//Computation of semantic similarities between each two concepts in CW

Similarity(CW
28. Append the similarity between CW (A, A) and CW (A, B) to S as: End For 31.End For 32. Assign resulting concept similarity weights to the concepts according to Equation (20).

Concept Similarity Weights
In this study, experts in the domain of Arabic language implemented the conceptual characterization of the Arabic ontology.The concepts and semantic relations of the ontology hierarchy were then built using the Protégé tool [69].Considering the concept hierarchy structure of the biomedical information depicted in Figure 1, the concept similarities can be computed based on the distances between nodes.In this regard, computing distances between nodes has been introduced in several studies through different methods depending on the domain of application [70].In the current study, computing similarities among concepts that constitute the ontology structure encompasses three elements: (1) the path distance between concepts, (2) the concept's distribution layer, and (3) the mutual parent concept's distribution layer.For each concept node within the ontology, we can trace and obtain all its paths to the root concept node, then generate the routing table of the ontology.
Therefore, the concepts weighted path distance (WPD) is calculated by considering the following factors: If the path distance (PD) that the concepts have is long, they will have less similarity, as in the following example, where C is the concept node of index i in the ontology structure.
Neglecting the path distance factor, the deeper the neighboring concepts localize at the distribution layer-level, the higher the similarity they have, as For concepts that have a mutual parent, the deeper they localize at the distribution layer, the higher the similarity they have, as an instance: Assuming two adjacent concepts q A and q B , we can compute the WPD of the concepts using Equation ( 15) of Algorithm 1, where layer (q A ), and layer (q B ) denote the distribution layer number for concepts q A , and q B , respectively.M represents the number of upper layer in the entire ontology hierarchy besides λ which is a scalar that is set through experimentation.For our work, it was assigned a value of 1.
Eventually, the document set that is expressed as in Equation ( 14) will be rewritten as where → the frequency weight of "concept l" in document m, a → the documents number, A → the concepts number.

The Proposed Hybrid Genetic-Whale Optimization Algorithm for Arabic Ontology Learning
In the ordinary WOA, the exploitation phase relies on computing the distance between the whale (search agent) and the best one known in this iteration.To optimize the exploitation capability of WOA and solve the premature convergence issue of the WOA, in this study, the genetic operations of GA were combined into WOA.The core of the proposed algorithm, G-WOA, is the hybridization of the WOA's operators along with GA's operators [71] to optimize the ontology learning from Arabic text by optimizing the WOA's exploration-exploitation trade-off.The operator of G-WOA is mainly a hybrid operator (as shown in lines 7 to 29 of Algorithm 2), which integrates GA's mutation, crossover, selection, and the WOA's components, called, encircling prey, bubble-net attacking, and searching for prey.

Initial Population
The GA is embedded into the WOA algorithm in order to develop a number of whales (search agents) in the form of chromosomes.Every chromosome is a hypothesis for the best solution (preys).Therefore, every search agent contains genes, each of which represents a concept/semantic relation of the ontology.A set of random agents c t p,j is generated initially.After generating the random solutions, the hybrid G-WOA starts to search for the best solution through a number of iterations (t).

Algorithm 2:
The proposed hybrid G-WOA Algorithm for ontology learning from Arabic text Input: A vector R assigns the document's mapped features.//The G-WOA algorithm parameters: Pop Size ← population size, Cr ← crossover rate, MR ← mutation rate, E ← The stopping criterion, h ← constant defines the logarithmic spiral shape, r ← random variable, where r ∈ [−1, 1], K → coefficient vector of WOA, and e ← is linearly decreased from 2 to 0 along iterations (t).//Fitness function parameters w f ← weight of false alarm rate, w d ← weight of detection rate, and w c ← selected features weight.Output: R * the solution with the optimal concept/semantic relation set contributing to the ontology.

1.
Represent each document d 1 , d 2 , . . ., d O by a single whale c to obtain a pool O of whales.
Evaluate the fitness for each whale c i ∈ C using Equation ( 21).

3.
Get the best individual c best and set it as c 0 G . 4.
While (stopping criterion E is not met) For each p ← 1 to Pop Size 8.
Randomly select two whales c t rand1, j , c t rand2, j ∈ C (c rand1 c rand2 ): 10. Update K, A, e, h , and r.

11.
For each gene j in the solution c p, j 12.

Fitness Evaluation
An internal classifier was used to evaluate the fitness value of each agent (whale).In this work, it was proven that the SVM showed the best performance among the other classifiers.We used fitness function for measuring each agent's false alarm rate, detection rate, and the number of concepts selected in each iteration until reaching the best solution.The optimal solution will be the one that decreases the False Alarm Rate (FAR), increases the Detection Rate (DR), and decreases the number of selected concepts.A standalone weighted fitness function was used to deal with this Multi-Criteria Decision Making.Three weights w f , w d , and w c were used to define FAR, DR, and the number of selected features, respectively. where G k ← 0 i f the concept is selected through selecting its representative gene o f the whale 1 i f the concept is neglected through neglecting its representative gene o f the whale M ← Number of concepts.

Mutation
The mutation operator, which is the core of the G-WOA algorithm, was used to produce a mutant vector.In this regard, a mutation rate MR is defined as a prerequisite.If the gene of the picked solution is lower than the MR value, then the algorithm will mutate each gene within the parent solution using Equation (26).Where o f f spring t p,j is the new generated solution c t rand1, j and c t rand2, j are two randomly selected parents, M p, j is a random value in the range [0, 1], t denotes the current iteration number, and p represents the whale number.o f f spring t p,j = c t rand1, j + M p, j c t rand2, j − c t rand1, j

Crossover
In the encircle prey phase, the uniform crossover operator is performed between the mutant vector, namely, o f f spring t p, j , and a randomly selected solution c t rand, j .The ordinary WOA algorithm uses a random variable to compute the distance between the best whale and the search agent without considering the fitness value for neither the current solution nor the functioned one.On the contrary, the G-WOA implements the crossover operator of GA in the encircle prey phase so that it selects a neighbor solution around the optimal solution.The crossover rate Cr is defined as a parameter for the G-WOA algorithm.The parent solution is integrated with the neighbor solution to generate the child based on the Cr value, using the following equation:

Selection
The selection operator was implemented in G-WOA to determine if the target or offspring survived to the following iteration.The selection operator in G-WOA is expressed as in Equations ( 28) and (29).If every gene value of the generated solution is higher than the mutation value, then the G-WOA will replace the parent solution with the generated one.This comparison will be performed for each solution in the population.Then, the best solution is selected from the updated population based on the fitness value computed using Equation (21).The new best generated solution c t best will be replaced with the old one c t G if the each gene value of the best solution is lower than the mutation value:

Termination Phase
In the G-WOA algorithm, the new position of i th individual in the following generation is the fittest one between parent c t p and child o f f spring t p .In this context, solutions should regard boundary constraints.In case the constraints are violated, Equation (30) can be used to apply the following repairing rule: where u j and l j represents upper and lower bounds of the solution's j th dimension, respectively.c p, j refers to the j th dimension of the p th solution.rand(0, 1) represents a random number (between 0 and 1).Furthermore, the G-WOA algorithm checks the current iteration index.If the current iteration index reached the limit of the predefined criterion (E), then the new solutions generated are chosen, which are the solutions with the highest fitness.Then, the database is updated with the new solutions for Arabic ontology structure.Otherwise, the G-WOA algorithm will proceed the iteration process.

Experimental Results
This section discusses the validation results of the proposed approach for Arabic ontology learning based on text mining and G-WOA algorithms.Extensive experiments have been conducted using different bio-inspired optimization algorithms and over different Arabic corpora.Furthermore, to discuss and evaluate the how the proposed approach works for the non-Arabic setting, we applied it to two publicly available non-Arabic corpora and compared the results to the state-of-the-art works that use the same corpora.The details of the experiments are illustrated in the following section.

Corpora
The Arabic corpora tested in this work are automatic content extraction (ACE) [72,73] corpora, ANERcorp [74,75] dataset, and a private corpus of Arabic biomedical texts.In the previously published computational linguistic work, the ACE and ANERcorp were frequently utilized for the purposes of evaluation and comparison with the existing systems.Three ACE corpora were investigated in this study: ACE 2003 (Broadcast News (BN), and Newswire (NW)), as well as ACE 2004 (NW).They are publicly available and were all tested by the proposed algorithm.For each dataset, the types of concepts (named entities) and their representation are demonstrated in Table 3.With the goal of identifying certain types of Arabic biomedical named entities in this work, we created a private corpus for evaluating the proposed approach of Arabic ontology learning.This task was accomplished by collecting a number of the Arabic open source texts in the biomedical domain, which were assessed by expert physicians.The private corpora information was illustrated in Table 3, where we represent each class in each Arabic domain by a number of documents that contained the number of unique words the concept mining and ontology learning algorithms will operate on.Furthermore, the non-Arabic corpora tested in this work include two publicly available ones that belong to the biomedical domain and are related to the protein-protein interactions.These corpora are Learning Language in Logic (LLL) [76] and the Interaction Extraction Performance Assessment (IEPA) [77].The LLL corpus presents the task of gene interaction from a group of sentences related to Bacillus subtilis transcription.The IEPA dataset comprises 303 abstracts obtained from the repository of PubMed, each one including a particular pair of co-occurring chemicals.

Performance Measures
The performance validation measures used in this paper are precision (PRE), recall (REC), and F-score (F), [58].The F-score is used in information retrieval to represent the harmonic incorporation of the values computed from precision (PRE), and recall (REC) measures.These metrics were calculated for each k-fold using Equations ( 31)-( 33), then we finally estimated the overall average of their values:

Cross Validation
In this work, we used k-fold cross-validation to evaluate the quality of the solution obtained using the G-WOA algorithm, in which k is equivalent to ten.Each corpus was randomly separated into ten sub-samples which were equally sized.From each corpus, a single sub-sample was set as a validation set so that it was used in performance testing, then the k − 1 sub-samples were employed as a training set.This procedure was repeated 10 times.In each fold, each k sub-sample was employed exactly once as the validation set.The k-outcomes of the folds were then averaged so that they provided a single rating.

Comparison to the State-of-the-Art
The comparison to the state-of-the-art was composed of three experiments: (1) comparisons with the other bio-inspired optimization algorithms existing in the literature regarding Arabic ontology learning, (2) comparisons with the previously published approaches on Arabic ontology learning from the text, and (3) comparisons with the state-of-the-art on learning ontology from non-Arabic settings.Firstly, to validate the performance of the proposed G-WOA algorithm in learning ontology from Arabic text, we compared the solution results returned by it to those returned by the ordinary GA and WOA.Moreover, extensive comparisons were conducted by comparing the performance of the G-WOA algorithm to three other bio-inspired algorithms: PSO [44], moth flame optimization (MFO) [45], and the hybrid differential evolution-whale optimization (DE-WOA) [46].To compare these bio-inspired algorithms, the parameter setting had to be determined for each.Table 4 presents the parameter list used in this work, which was taken from [32,37,44,78].In each experiment, the tested algorithm was first implemented into one of the previously mentioned corpora.Then, the measures of PRE, REC, and F were computed using Equations ( 31)- (33).This process was repeated for each dataset.Then, the average values of the three measures across all datasets were computed.
For each algorithm of the G-WOA, GA, WOA, PSO, MFO, and DE-WOA, we demonstrated the detailed validation results obtained across all the investigated Arabic corpora.The results are demonstrated in Tables A1-A6 of Appendix A, respectively.To sum up, Table 5 presents the total average measures of each algorithm across all the corpora.From Tables 5 and A1, Tables A2-A6, it is apparent that the proposed G-WOA algorithm outperformed the other algorithms in all folds and across all the datasets.The PRE, REC, and F results provided by the hybrid G-WOA algorithm were higher when compared to those from the ordinary GA, WOA, PSO, MFO, and the hybrid DE-WOA algorithm.Taking the ACE 2003 (BN) as an example, the results of PRE, REC, and F were 98.14%, 99.03%, and 98.59%, respectively.The F-score (F) values obtained using the G-WOA were also 98.79%, 98.44%, 98.57%, and 98.63% for ACE 2003 (NW), ACE 2004 (NW), ANERcorp, and the private corpus.
Compared to the GA algorithm, the obtained F-score (F) values were 93.75%, 93.63%, 93.73%, 93.65%, and 93.41% for the ACE 2003 (BN), ACE 2003 (NW), ACE 2004 (NW), ANERcorp, and the private corpus.On the other hand, the WOA achieved F-score (F) values of 96.66%, 96.87%, 96.79%, 96.94%, and 96.81%, respectively for the aforementioned corpora.From these results, the improvement in F-score (F) of the ontology learning in comparison to the basic GA algorithm was 4.84%, 5.16%, 4.71%, 4.92%, and 5.22%, respectively, for the same corpora.In comparison to the ordinary WOA, the improvement reached 1.93%, 1.92%, 1.65%, 1.63%, 1.82%, respectively, for the five corpora.These results indicate that the G-WOA was able to accelerate the process of global searching when learning ontology, with its ability to balance effectively both exploration and exploitation.Furthermore, the results also show that the MFO algorithm was more optimal than WOA in ontology learning.This is due to the good ability of MFO to switch between both exploration and exploitation, contrary to the WOA, which was trapped early in local optima throughout the optimization process [37].Therefore, the MFO algorithm occupies the second rank after the G-WOA in terms of Arabic ontology learning.On the other side, the DE-WOA has a lower performance than G-WOA and MFO across all the datasets.Although the DE algorithm had robust global searchability, it was weak in the exploitation, and converged slowly.Thus, the DE algorithm needs to be optimized for it to be hybridized with other algorithms, as reported in [46].Thus, the DE-WOA has the third rank in terms of the Arabic ontology learning.
In contrast, the convergence speed is a crucial criterion for evaluating the performance of any optimization method.Therefore, the convergence time for the proposed hybrid G-WOA algorithm was computed and compared to the time obtained by all algorithms versus the false alarms rate.The false alarms rate was computed in this paper using Equation (22).As depicted in Figures 2 and A1 (of Appendix A), when following the WOA algorithm across all the Arabic datasets, we see that it took a lower convergence speed in comparison to the proposed hybrid G-WOA algorithm.This can be interpreted by the poor exploitation ability for the ordinary WOA algorithm, which requires a long time to search for the offspring and parents.On the contrary, the hybrid G-WOA algorithm overcame this drawback by combining the genetic operations into the WOA algorithm.
Secondly, to investigate the efficacy of the proposed approach that integrates the text mining and G-WOA algorithms to learn ontology from the Arabic text, we performed a comparison between it and the more recent works that use the same Arabic corpora, in terms of precision (PRE), recall (REC), and F-score (F).Table 6 shows the comparison.Compared to the other methods presented in the literature, as Table 6 shows, the proposed approach yielded superior results in terms of PRE, REC, and F measures.These results demonstrate the robustness of integrating text mining and G-WOA algorithms.The highest results were achieved when applying the proposed method to the ANERcorp: PRE = 87%, REC = 60%, and F = 94%.

Proposed approach 2019
A text mining algorithm to extract the initial concept set from the Arabic documents.A proposed G-WOA algorithm to get the best solutions that optimize the ontology learning through selecting only the optimal concept set with their semantic relations, which contribute to the ontology structure.

Conclusions
The majority of the state-of-the-art works on Arabic ontology learning from texts have depended on the hybridization of the handcrafted rules and machine learning algorithms.Contrary to the literature, this study presented a novel approach for Arabic ontology learning from texts which advances the state-of-the-art in two ways.First, a text mining algorithm was proposed for extracting the initial concept set from the text documents together with their semantic relations.Secondly, a hybrid G-WOA was proposed to optimize the ontology learning from Arabic text.The G-WOA integrates the genetic search operators like mutation, crossover, and selection into the WOA algorithm to achieve the equilibrium between both exploration and exploitation, in order to find the best solutions that exhibit the highest fitness.The experimental results revealed the following conclusions.
Firstly, as for learning Arabic ontology from texts, the proposed GA-WOA outperformed the ordinary GA, and WOA across all the Arabic datasets in terms of PRE , REC , and F measures.When comparing the solution results obtained using the G-WOA to those obtained using the ordinary GA, we found an improvement in F-score ( F ) by up to 4.84%, 5.16%, 4.71%, 4.92%, and 5.22%, respectively, for ACE 2003 (NW), ACE 2004 (NW), ANERcorp, and the private corpus.Furthermore, the improvement reached 1.93%, 1.92%, 1.65%, 1.63%, 1.82%, respectively for the same corpora when using the ordinary WOA algorithm.Secondly, the G-WOA also outperformed Thirdly, to test the efficiency of the proposed approach to learn ontology from the non-Arabic text, we applied it to the two aforementioned publicly available corpora.Furthermore, we compared its performance to the other approaches presented in the literature to learn ontology from the non-Arabic text, in terms of PRE, REC, and F measures.The comparison is shown in Table 7, while the application results are demonstrated in Table A7 of Appendix A. The results demonstrate that the proposed G-WOA achieved superior results when applied to the non-Arabic corpora.98.1% and 97.95%, respectively, for the two corpora.

Contributions to the Literature
From the previous results, the proposed approach outperformed the state-of-the-art approaches to Arabic ontology learning.Likewise, it was also noted that very little research has used the evolutionary approaches, whereas no hybrid bio-inspired algorithms were investigated by the previously published works.The majority of studies have depended on the hybridization of rule-based and machine learning approaches [4,23], which have shortcomings, as previously discussed in the introduction section.Some works used deep learning algorithms [78,79] like the long-short-term-memory and convolutional neural network, but the results are still below expectation.In [78], a deep neural network-based method was proposed.The application results to the ANERcorp were 95.76%, 82.52%, and 88.64%, in terms of PRE, REC, and F measures.These results are also lower than those obtained using the approach proposed in this work.In [79], the F measure results obtained using the presented deep learning approach were 91.2%, 94.12%, 91.47%, and 88.77%, respectively, for the ACE 2003 (NW), ACE 2003 (BN), ACE 2004 (NW), and ANERcorp.These results are also lower than those obtained by applying the approach proposed in this work to the same corpora, which reveals the efficiency of our approach.The results presented in [78,79] reveal the need for enhancing the performance of the deep learning methods and to overcome their shortcomings such as being stuck in the local optima, when applied to the natural language processing, for instance, through using the bio-inspired optimization algorithms.
The proposed ontology learning approach is also applicable with non-Arabic texts.Furthermore, the comparisons to the state-of-the-art approaches on learning ontology using the same non-Arabic corpora demonstrate higher results in favor of the proposed approach.These results confirm that the proposed approach outperforms the state-of-the-art methods on learning ontology from the non-Arabic texts.

Implications for Practice
As for ontology learning using the G-WOA algorithm, the contributions of GA and WOA enabled the GA-WOA to jump out easily of the local minima.Accordingly, it found a promising search direction toward global optimization.Specifically, the G-WOA algorithm has a robust capability to attain equilibrium between both global and local exploitation.Therefore, the proposed hybrid G-WOA algorithm outperformed the other compared algorithms in terms of speed.
The implications for practice show that the synergy of text mining and G-WOA algorithms can operate on either the Arabic or non-Arabic document by extracting the concepts and their semantic relations and then providing the solutions with the best set of concepts between the initial one.The obtained solutions can optimize the ontology construction from the Arabic or the non-Arabic text by returning only the important concepts that contribute to the ontology structure while ignoring the redundant or less important ones.

Conclusions
The majority of the state-of-the-art works on Arabic ontology learning from texts have depended on the hybridization of the handcrafted rules and machine learning algorithms.Contrary to the literature, this study presented a novel approach for Arabic ontology learning from texts which advances the state-of-the-art in two ways.First, a text mining algorithm was proposed for extracting the initial concept set from the text documents together with their semantic relations.Secondly, a hybrid G-WOA was proposed to optimize the ontology learning from Arabic text.The G-WOA integrates the genetic search operators like mutation, crossover, and selection into the WOA algorithm to achieve the equilibrium between both exploration and exploitation, in order to find the best solutions that exhibit the highest fitness.The experimental results revealed the following conclusions.
Firstly, as for learning Arabic ontology from texts, the proposed GA-WOA outperformed the ordinary GA, and WOA across all the Arabic datasets in terms of PRE, REC, and F measures.When comparing the solution results obtained using the G-WOA to those obtained using the ordinary GA, we found an improvement in F-score (F) by up to 4.84%, 5.16%, 4.71%, 4.92%, and 5.22%, respectively, for ACE 2003 (NW), ACE 2004 (NW), ANERcorp, and the private corpus.Furthermore, the improvement reached 1.93%, 1.92%, 1.65%, 1.63%, 1.82%, respectively for the same corpora when using the ordinary WOA algorithm.Secondly, the G-WOA also outperformed the PSO, DE-WOA, and MFO across all the Arabic corpora, in terms of the three measures.The MFO occupies the second rank after the G-WOA, in terms of ontology learning from Arabic text.This was interpreted by the good ability of MFO to switch between both exploration and exploitation.Thirdly, the G-WOA outperformed the other algorithms in convergence speed.Taking the WOA as an example, it is found to have low convergence due to its poor exploitation.Thus, the G-WOA algorithm is superior when compared to the other bio-inspired algorithms in terms of convergence speed.
Furthermore, the G-WOA exhibited low rates of false alarms across all the Arabic datasets, in comparison to the other algorithms.Fourthly, the proposed Arabic ontology learning approach, which is based on the synergy of text mining and G-WOA algorithms, outperformed the state-of-the-art in terms of precision (PRE), recall (REC), and F-score (F).This was due to its high capability to extract the concepts along with the semantic relations from the Arabic documents, then creating a population of search agents (solutions) that include genes represent the initial concepts.Moreover, the G-WOA starts to search for the best solution through a set of iterations, including embedding the genetic operators into the WOA architecture.Eventually, the algorithm returns the solution which recommends the best set of concepts/relations that can contribute to the ontology.Eventually, the proposed ontology learning approach is applicable to the non-Arabic texts.It achieved higher performance that outperformed the state-of-the-art approaches on learning ontology from the non-Arabic text.

Limitations and Future Research Directions
The proposed approach for Arabic ontology learning cannot deal with learning the hierarchical feature representation from the text.One advantage of the deep learning algorithms is that they are able to generate high-level feature representation from raw texts directly.Therefore, we tend to present a deep neural network model using latent features to improve learning ontology from Arabic texts.The proposed model will work on embedding the words and positions as latent features, therefore, it will not rely on feature engineering.To overcome the limitations of the deep network model, such as being stuck in the local optima, different bio-inspired optimization algorithms will be tested and compared in this regard.

Figure 1 .
Figure 1.Representation of some biomedical concepts in our corpus which have an Is-a semantic relationship.

Figure 1 .
Figure 1.Representation of some biomedical concepts in our corpus which have an Is-a semantic relationship.

Figure 2 .
Figure 2. The convergence time versus the FAR rate for all algorithms using the ACE 2003 (BN) corpus.

Figure 2 .
Figure 2. The convergence time versus the FAR rate for all algorithms using the ACE 2003 (BN) corpus.

Table 1 .
A state-of-the-art on Arabic text mining.

Table 2 .
A state-of-the-art on Arabic ontology learning.

Algorithm 1 :
The Proposed Arabic Text Mining AlgorithmInput: A term weighting matrix TW of training set corresponding to term set S T = {T 1 , T 2 , . . .., T a } Output: Matrix of mapped features WS obtained by assigning concept similarity weights to the concepts CW with the resulting concept weighting set corresponding to concept set C T = {C 1 , C 2 , . . . . . ., C a }, as follows: 4.
1 //The two elements are equal 9. C B ← mapping (T A ) 10. CW (A,B) = CW (A, B) + TW (A, B) Calculation of semantic similarities between n concepts of CW 20. //Matrix of resulting weights of concept similarity 21. [] ← S 22.For A = 1 to count (S T ) 23.

Table 3 .
Information of the Arabic corpora tested in this work.

Table 4 .
The parameter list used in this work.

Table 5 .
Average measures for each algorithm across all datasets (detailed results of algorithms can be followed in Tables A1-A6 of the Appendix A).

Table 6 .
Comparison to the state-of-the-art on Arabic ontology learning.

Table 7 .
Comparison to the state-of-the-art in non-Arabic settings.

Table A1 .
Performance evaluation of ontology learning using the proposed G-WOA over the five corpora.

Table A2 .
Performance evaluation of ontology learning using the GA over the five corpora.

Table A3 .
Performance evaluation of ontology learning using the WOA over the five corpora.

Table A7 .
Performance evaluation of ontology learning using the proposed G-WOA and the non-Arabic corpora.