Ontology Design for Solving Computationally-Intensive Problems on Heterogeneous Architectures

Viewing a computationally-intensive problem as a self-contained challenge with its own hardware, software and scheduling strategies is an approach that should be investigated. We might suggest assigning heterogeneous hardware architectures to solve a problem, while parallel computing paradigms may play an important role in writing efficient code to solve the problem; moreover, the scheduling strategies may be examined as a possible solution. Depending on the problem complexity, finding the best possible solution using an integrated infrastructure of hardware, software and scheduling strategy can be a complex job. Developing and using ontologies and reasoning techniques play a significant role in reducing the complexity of identifying the components of such integrated infrastructures. Undertaking reasoning and inferencing regarding the domain concepts can help to find the best possible solution through a combination of hardware, software and scheduling strategies. In this paper, we present an ontology and show how we can use it to solve computationally-intensive problems from various domains. As a potential use for the idea, we present examples from the bioinformatics domain. Validation by using problems from the Elastic Optical Network domain has demonstrated the flexibility of the suggested ontology and its suitability for use with any other computationally-intensive problem domain.


Introduction
Solving computationally-intensive problems is an attractive topic for researchers in the field of parallel processing and high-performance computing.In parallel to this, ontologies can play vital role in solving domain specific problems by making use of knowledge representation and reasoning [1][2][3][4].Due to the nature of existing hardware clusters [5], it is now common to have a cluster of hardware architectures [6] equipped with an NVIVIA GPGPUs and Xeon-Phi coprocessor in addition to traditional Intel CPUs.Task scheduling on such heterogeneous architectures is considered to constitute a major challenge and is one of the most important topics in current scientific research.The major factors that contribute to the intensity of an integrated architecture are the complexity of modern hardware architectures, and the parallel computing paradigms and the memory management techniques associated with them [7].Task scheduling can be characterized as the assignment of time-constrained jobs to time-constrained resources within a pre-defined time interval, representing the complete timescale of the schedule.The domain of solving computationally-intensive problems comprises several entities, namely the jobs that solve the problem [8,9], the computing devices with the hardware architectures on which the jobs will run, the scheduling strategies used to schedule the tasks of the jobs on the hardware and the algorithms used to solve specific problems in a specific domain.
A mapping scheme from the problem domain to the computer domain should be clearly identified.For this purpose, ontologies can be developed by describing entities from one or multiple domains and relations among these entities [10,11].Ontologies are recognized as conceptual information models that describe the entities in a specific domain, such as classes, relationships and functions [12].In this paper, we present an ontology in the domain of High-Performance Computing (HPC) and show how an ontology-based approach may be used to solve computationally-intensive problems on heterogeneous architecture.
Many ontological reasoning-based approaches have been presented so far to address the challenges related to HPC and parallel computing.For example, a framework as an integrated solution of parallel computing and ontological reasoning is presented in [13].It can be used on scalability issues in parallel computing by using ontological reasoning.An ontology learning-based framework to address scalability issues in parallel computing is presented in [14].Similarly, an ontology-drive solution for cloud service handling and discovery is presented in [15,16].In [17], the authors present an ontology editor (i.e., ONTOLIS) that can be used for concept mapping and inferencing for better content coverage in Big Data and HPC courses.In [18], the authors present a parallel ABox reasoning algorithm for increased scalability in HPC and parallel computing environments.computationally-intensive problems may be treated and solved as self-contained problems by identifying hardware, software and scheduling strategies, depending on the complexity and nature of the problem.Ontologies can play a significant role in mapping domain concepts to an inferencing environment and suggesting the best possible solutions at run time.Due to the diversity of domains and involvement of multidirectional software and hardware, very little work has been undertaken on solving computationally-intensive problems on heterogeneous architectures by designing and using ontologies.
In this paper, we present an ontology for solving computationally-intensive problems by performing reasoning on the data for given jobs.The work presented covers multidimensional domain knowledge, such as the versatility of hardware architectures, software, scheduling strategies and memory management techniques, which makes this work appropriate for domain users who belong to either the HPC or parallel computing domains.We also show the potential use of this work by presenting a real case study at the HPC Center of King Abdulaziz University.
The rest of this paper is organized as follows: Section 2 describes some work related to use of ontologies in solving various domain specific problems.Section 3 describes the structure of the ontology, including its conceptual basic building blocks.Section 4 describes the main concepts of the ontology, such as the classes, along with its attributes.Section 5 provides an evaluation to the flexibility of the proposed ontology and shows how we can easily insert computationally-intensive problem domains to it.Finally, in Section 6 we conclude our work.

Related Work
Ontological reasoning plays a key role in solving diverse complex problems in various domains.With the help of domain experts, semantic web experts and ontology engineers are developing ontologies that use ontological reasoning to solve problems that otherwise cannot be solved (or might not be addressed efficiently) regarding textual entities or data available in relational databases.In this section, we present work related to ontology development in various domains and the use of these ontologies in data representation, as well as solving complex problems.
An integrated solution of ontological reasoning and parallel computing to address issues of scalability is presented in [13].Parallel computer architectures, such as home computing environments, multi-core machines, grid and peer-to-peer, lead to great demand for efficient handling of computational resources through making use of software architecture, algorithms and now a new paradigm-making use of ontological reasoning.In [13], the authors propose using ontology modularization and queries.This helps to solve the problem of reasoning in individual modules, ultimately to achieve maximum efficiency in the use of architectures and parallel computing algorithms.
In [15], the authors present an ontology-driven platform for using and integrating the services provided by various cloud environments.The proposed framework may also be used to describe the functionality of given services by making use of ontological concepts, looking for other services from target providers to generate client adapters to use the required services.The inference engine of the proposed framework performs semantic matching to find the best possible matched service from the cloud, facilitating the resulting applications as an integrated resource of diverse services.
An ontology learning-based framework is presented in [14].The proposed framework aims to improve scalability by making efficient use of processing power, computing resources and processing time.The proposed approach tackles the difficulties in distributed and low-level parallel programming by coupling high-level semantic descriptions with programming models.
Another approach to discover suitable services from an increasing choice on the cloud, according to user requirements such as cost, security and performance, is presented in [16].The authors propose mapping services attributes to ontologies to minimize the gap between service descriptions, types, features and naming conventions.Using ontological reasoning on such services makes service discovery easier and more accurate on the cloud.
An ontology-based approach and framework for organizing courses covering Big Data and HPC contents is described in [17].In this work, the authors introduce an ontology editor (i.e., ONTOLIS) that can be used to stimulate the staffing of various domains, as well as IT companies.The framework presented in [17] makes use of ontological reasoning to bridge the learning gap between domain expertise (having IT expertise; medium-level IT users; or introductory IT students).An analysis and an interpretation of archives and collections through historical data are presented in [19].This study proposes methods for semantic associations of social networks by linking them with external datasets.
An ontology-based knowledge framework for material selection in engineering domain is presented in [20].In this work, the authors provide a semantic representation of labeled instances, as well as the material products, representing them as RDF instances and then generating a knowledge graph from the RDF triples.The graph is generated from reasoning on the domain knowledge (modeled as classes, sub-classes, properties and individuals of the ontology).The knowledge graph generated by the knowledge framework makes suggestions for material selection.
In [21], the authors use ontological reasoning to find situations from calendar events to support users in fulfillment of their jobs.They present an ontology developed for the event domain that can be used to accommodate various situations from such events as classes and properties of the ontology.The proposed approach infers the situations by using both temporal and semantic aspects of situations.Further, the authors implemented the approach as an application for mobile phones to manage incoming calls based on the inferred situation of the user.
A parallel ABox reasoning algorithm for increasing the scalability in parallel computing environment is presented in [18].The proposed algorithm can be used to model disjointness and inconsistencies in the ontology model, which ultimately can improve parallelization and reduce resource cost.The separation of ABox reasoning from TBox reasoning in the proposed algorithm makes the derivation simple and paralyses the integration steps, and this ultimately improves efficiency and reduces memory access.

Ontology Structure
The ontology's design philosophy depends mainly on its flexibility in dealing with problems from various domains.The key point is to map the algorithms used to solve problems in a given domain to equivalent algorithms in the computer science domain.Figure 1 shows the conceptual basic building blocks of an ontology.The computationally-intensive problem can be mapped to a computer algorithm using the mapping scheme.The problem to be solved needs a computing device that contains at least a hardware architecture.The algorithm can use different parallel paradigms that run on different architectures.The scheduling strategy is responsible for assigning the appropriate architecture to the problem.In this paper, we use the bio problem as a case study of a computationally-intensive problem domain.It is clear that we can consider the proposed ontology as if it were comprised of multiple ontologies, in addition to a mapping scheme.

Ontology Basics
In this section, we describe the named classes of the ontology, the object properties and some instances.We will also outline axioms that can be used to define the meaning of several components of the ontology.Moreover, we briefly describe the mapping from the bio problem domain to the computer science domain.

Named Classes
Classes are interpreted as sets that contain individuals.They are described using formal (mathematical) descriptions that state precisely the requirements for membership of the class.The class tree contains one class called owl:Thing, which is superclass of everything.We have created five disjoint subclasses: "Algorithms", "BioProblem", "ComputingDevice", "Hardware", "ParallelParadigm", and "SchedulingStrategy".
Focusing on the BioProblem subclass can lead us to the following facts: • BioProblem class has a set of bio problems such as "Comparing Sequences", "DNA Arrays", "Finding Signals", "Genome Rearrangements", "Identifying Proteins", "Mapping DNA", "Molecular Evolutions", "Predicting Genes", "Repeat Analysis", and "Sequencing DNA".This is shown in Figure 2. Defining these classes can help domain experts in finding the best scheduling strategy as a solution for integrated problem of bio and HPC domains.

•
BioProblem class has a relation with SchedulingStrategy, Hardware, ComputingDevice, and ParallelParadigm classes, such that the BioProblem needs ComputingDevice equipped with at least hardware architecture to run the job on it.ParallelParadigm is needed to write code that best fits the selected hardware.SchedulingStrategy is used to schedule the tasks on the hardware of the ComputingDevice.

•
Property "hasAlgorithm" This property assigns an algorithm (from the computer science domain) to solve a given bio problem."IsAlgorithmOf" is an inverse property of it.

•
Property "hasHardware" This property assigns a hardware architecture on which the algorithm will run to solve a given bio problem."IsHardwareOf" is an inverse property of it.

•
Property "hasParallelParadigm" This property assigns a parallel paradigm used to write an algorithm to solve a given bio problem."IsParallelParadigmOf" is an inverse property of it.

•
Property "hasSchedulingStrategy" This property assigns the scheduling strategy used to deploy the hardware used to solve a given bio problem."IsSchedulingStrategyOf" is the inverse of it.

•
Property "hasComputingDevice" This property assigns at least one computing device to solve a given bio problem."IsComputingDevice" is the inverse of it.

•
Property "hasParameters" This property assigns the parameters required for each algorithm.For example, a motif-finding problem has L, d, n, and T, where: L is the length of motif, d is the permitted mutation, n is the number of characters in each sequence, and T is the number of sequences.These parameters are all of the type 'integer'.

•
Property "hasArchitecture" This property assigns at least architecture to a computing device.Each computing device can be equipped with one or more architectures.

Instances
We initially selected two famous bio problems: "DNA sequence alignment"; and "Motif-finding Problem".

•
Instance "SequenceAlignment" The DNA sequence alignment is one of the most famous bio problems.It is classified under "ComparingSequences".It can be solved using either combinatorial pattern matching or divide-and-conquer or dynamic programming algorithms (as shown in Figure 4).

•
Instance "MotifFindingProblem" The motif-finding problem is one of the most famous bio problems.It is classified under "FindingSignals".It can be solved using exhaustive or greedy searches, hidden Markov models or randomized algorithms (as described in Figure 5).

Axioms
This section describes the axioms used to define the meaning of various components of the ontology and relationships.Table 1 lists the axioms for the ontology.

Concept Name Axiom Description Logical Expression
Algorithm A collection of computer algorithms such that an algorithm is a software procedure or formula used to solve a specific problem in this domain, based on conducting a sequence of specified actions.

A solves.Problem
Computationally_ Intensive_Problem A generic term to describe any problem in any domain that needs intensive computations.

CIP Problem
Computing_Device A device that should contain at least one physical hardware processor.

CD ∃contains.HardwareProcessor
Hardware A generic term to express the processors used in performing the computations.

HW usedFor.Computation IntelCPU
A traditional type of central processing unit developed by Intel.

Evaluation of the Flexibility
To evaluate the flexibility of the proposed ontology, we tested it on an additional domain called the "Elastic Optical Network (EON)".The International Telecommunication Union (ITU) divides the optical spectrum range of 1530-1565 nm (the so-called C-band) into fixed 50 GHz spectrum slots.This fixed spectrum allocation wastes a great deal of the spectrum.EON is implemented to allow better utilization of the C-band.EON introduces plenty of challenges:

•
Finding an optical path from source to destination that passes through multiple links, all of which have the same free spectrum range.This problem can be solved using either the exhaustive search algorithm, the heuristic algorithm or linear programming techniques.

•
Finding a set of links that constitute an optical path, on condition that all the links have enough free contiguous spectra.This problem can be solved by using the computer science algorithm known as an exhaustive search.

•
Load balancing of traffic to minimize spectrum fragmentation.This can be solved by a sorting algorithm or a binary search algorithm.
As we can see, the EON problem domain can replace the Bio Problem domain.It is also possible to create a new class called "Computationally_Intensive_Problem" that has several subclasses such as "BioProblem", "EON" and so on.This can be shown as in Figure 6.The user can add as many computer algorithms as the user needs.Various scheduling strategies can be added.Eventually, it will be possible to use a ready-made ontology for computationally-intensive problems from any domain for merging with the proposed ontology.Now, we will describe two problems; the "motif finding problem" from the bioinformatics domain, and "Optical Path Finding" from the elastic optical network domain.The main purpose is to describe how the ontology is used to simplify and automate the whole process of problem solving.The same methodology used to deploy the ontology in solving the "motif finding problem" was used to solve the RSA problem.We achieved the same results as in [23].The same simplicity was achieved when the ontology was deployed in this case.

A Case Study of Bioinformatics Problem
This section describes the scheme for mapping from the bioinformatics domain to the computer science domain.Bioinformatics problems can be mapped to equivalent computer algorithms.Figure 7 shows how the graph algorithms from the computer science domain can be used to solve three different sets of problems in the bioinformatics domain.These include identifying proteins, sequencing DNA and DNA arrays.Figure 8 shows that "Finding Signals"-a problem from the bioinformatics domain-can be solved using four different sets of computer algorithms.These include the greedy search, randomized algorithms, hidden Markov models and the exhaustive search.We can conclude the cross domain problem mapping as, that one computer algorithm can solve many problems (as shown in Figure 7) and one problem can be solved by using different computer algorithms (as shown in Figure 8).

Conclusions
In this paper, we presented an ontology as a semantically enriched schema for computational intensive problems.We showed how data for a domain-specific problem can be mapped to classes of the ontology and linked with each other by the object and data type properties of the ontology, so that we can perform reasoning on the given data.We also showed that computer science algorithms and hardware architectures can be flexibly appended to respond to various domain requirements.Schemes for mapping from a given domain to computer science domain are presented.To prove the usage of ontologies in solving computationally-intensive problems, we took a real-life problem from the bio domain (at HPC Center of King Abdulaziz University), mapped problem-specific data to the ontology and used the mapped data.In fact, the ability of adding new computationally-intensive problem domains is the main focus of this work to satisfy the important flexibility and reusability concepts.Flexibility is clear in the ability of the proposed ontology to add as many computationally-intensive problem domains as possible.Reusability concept is clear since the use of computer algorithms and hardware resources is common in all domains.Our case study demonstrated validation by investigating problems from two different domains.

Figure 1 .
Figure 1.Conceptual Basic Building Blocks of the Ontology.

•
Computing device and its hardware used in solving the problem (IntelCPU and NVIDIAGPGPU) • Parallel computing paradigm used on each architecture (OpenMP on IntelCPU and CUDA on NVIDIAGPGPU) • Scheduling strategy used to solve the problem (Speed-based) • Computer algorithm used to solve the problem (exhaustive search)

Figure 7 .
Figure 7. Use of Graph Algorithms (Computer Domain) to Solve Three Different Problems in the Bioinformatics Domain.

Figure 8 .
Figure 8. Solving the "Finding Signals" Problem (Bioinformatics Domain) Using Four Different Sets of Computer Algorithm.

Table 1 .
Axioms for the ontology.