UvA-DARE (Digital Academic Repository) Using Semantic Web Technologies to Query and Manage Information within Federated Cyber-Infrastructures

: A standardized descriptive ontology supports efﬁcient querying and manipulation of data from heterogeneous sources across boundaries of distributed infrastructures, particularly in federated environments. In this article, we present the Open-Multinet (OMN) set of ontologies, which were designed speciﬁcally for this purpose as well as to support management of life-cycles of infrastructure resources. We present their initial application in Future Internet testbeds, their use for representing and requesting available resources


Introduction
Cloud computing supports many distributed applications that are vital to the economy and to the advancement of science.The rising popularity of cloud computing and the diversity of available resources create an urgent need to match those resources optimally to the requests of end-users.
The desired level of self-serve operation within the cloud obviates the need for intervention by IT departments, allowing end-users direct and independent access to the computational infrastructure.As a result, end-users can deploy the necessary infrastructure and software solutions rapidly.Toward this end, accurate modeling of the infrastructure must support abstract representation of various resources, simplify interactions with them, and expose the right levels of information.The next frontier in cloud computing lies in supporting widely distributed clouds and the various aspects of the architectures needed to manage resources across multiple administrative domains.These problems are also closely related to future Internet research in academia, as well as to emerging commercial technologies like the Internet of Things (IoT) [1].
Modeling cloud infrastructures in a manner that supports effective matching of users' requests with available resources is a challenging task.The issue becomes even more complex in the context of distributed cloud systems with multiple infrastructure owners.In the academic research the same problem is encountered when trying to describe computational resources, scientific instruments and testbeds, which belong to different institutions and must be used by inter-disciplinary and inter-institutional collaborative teams.In such environments each infrastructure owner may model resources using their particular information and data modeling approach to set up the system quickly and attract users.The end-result, however, is user lock-in and inability to easily leverage available resources if they belong to different owners.Thus, resource matching and recommendation based on common models becomes of great importance.
This paper describes Open-Multinet (OMN) [2], a set of ontologies that rely on Semantic Web (Semantic Web) [3] technologies.It was designed by an international team of academic researchers who are intimately familiar with the related problems.The OMN researchers are also involved in multiple efforts to design a federation of Future Internet and cloud testbeds spanning the US and the EU, to be used for at-scale experimentation with novel concepts in networking and distributed systems.While we briefly introduced the ontology set in [2] and presented a preliminary description of its application in the context of a federated cloud environments in [4], in this paper we complement our previous work by an extended description of the OMN ontology set and we further added new evaluation results of the overall OMN framework.
Motivation for our work comes largely from our experience with the growth of academic networking, including the proliferation of cloud testbeds.Their ad hoc attempts to federate with each other, i.e., to make their resources available to wider communities of users through common interfaces, suffer from a lack of common models to describe available resources.Testbed owners use such models chiefly to provide their users with information about available resources, e.g., compute nodes, storage, network switches, and wireless access points.Each user, in turn, employs similar models to request resources from the testbeds, describing in some detail the exact configuration of available resources needed from the testbed.
Most testbeds are small when they first launch.Their designers often spend little time thinking through the information model that they wish to use to present resource information to users.Testbeds frequently rely on home-brewed solutions utilizing syntactic schema specifications serialized using XML or JSON, sometimes referred to as RSpecs, although RSpec is also a name of a specific XML dialect used by a subset of testbeds in the US.Documents expressed in those languages are passed between the users and the testbed management software in order to describe the available resources and to request specific configurations of resources for the experiments.While the built-in mechanisms in those languages allow for straightforward verification of document syntax, few mechanisms are available for validation of semantic correctness.These solutions typically rely on structure-implied semantics to validate correctness by associating semantic meaning rigidly with the position of information elements within the document.
These approaches tend to work in early phases of the design.As the diversity of resources grows, however, and as the sophistication of users increases, the need arises for extension mechanisms.Demand emerges for more powerful resource descriptions.The extension mechanisms then inevitably relax the structure-implied semantics, thus making validation of documents progressively more difficult.We observed this development first-hand in the case of US Global Environment for Network Innovations (GENI) [5] and EU Future Internet Research and Experimentation (FIRE) [6] testbed-federation efforts.XML schema extensions were introduced to allow different federation members to describe the unique attributes of their cloud testbeds.The extensions, we found, made it possible to create syntactically valid but semantically invalid documents requesting resources from a testbed, e.g., by requesting that a particular operating-system image be attached to a network interface instead of to a compute node.
Informed by these experiences, we decided to adopt Semantic Web technologies, which provided us with a number of advantages: • A common standardized model is used to describe cloud and testbed infrastructures.
The extensibility of this model is built into it from the start in the form of additional ontologies that describe new types of resources.The machinery to deal with extensions is built into standard semantic web toolkits, leaving the designers free to think about the information model while affixing the data model.

•
Different resources and descriptions easily can be related and connected semantically.Semantic web mechanisms intuitively represent computer network graph structures.Network topologies are embedded into the RDF graph using graph homeomorphisms and then are annotated with additional information, addressing structural and semantic constraints in a single structure.

•
Model errors can be detected early, before testbed resources are provisioned, by using many standard inference tools.

•
Rules can in particular be used to complement queries.Rules for harmonizing relationships should to be defined and applied on the federation level.This is where specialties and commonalities of the involved testbeds are known and this approach lifts the burden from users to formulate complex queries.

•
The annotation process, i.e., the conversion from XML-based RSpecs to RDF-based graphs, is automatic and configurable to take testbed specific extensions and federation-wide agreements into account.

•
Using standard Semantic Web tools, complex queries can be formulated to discover resources.
A common way for testbeds to operate is by ingesting JSON/XML or other encoding of the user request or resource advertisement and then converting it into a non-portable native form on which queries and embeddings are performed.Semantic web tools allow us to store testbed-state information natively in RDF and to operate on that information using a multitude of native inference and query tools, thus simplifying and abstracting many parts of testbed operations.

•
Once cloud resources are described semantically, they can be interlinked to other Linked Open Data (LOD) [7] cloud data sets.These linkages provide additional information about resource availability or constraints and help to link resources, e.g., to policies governing their allocation.

•
Semantic resource descriptions support convergence from multiple syntactic-schema based representations of testbed resources to a single semantically enriched representation that combines information from multiple sources.Such sources include various RSpecs describing testbed resources, out-of-band knowledge that may be encoded in resource names or contained in human-readable online Web pages, an approach consistent with Ontology-based Data Access (OBDA).Encoding this information in a structured way into a single representation prepares it for direct analysis, without need of an intermediate representation.Answers are derived by matching resources required by the user to those available at one or more different testbeds, federating the testbeds automatically, with minimal human intervention.
We believe that our approach represents an interesting application of OBDA to a novel area of use that combines information search and retrieval with active infrastructure-resource management.
The OMN development effort consisted of several phases, starting with the upper ontology design, followed by the design of several critical subordinate ontologies for, e.g., resource monitoring.We relied heavily on previous work in this area, directly incorporating, for instance, the Infrastructure and Network Description Language (INDL) [8,9] ontology for describing computer networks and the Networking innovations Over Virtualized Infrastructures (NOVI) [10] ontology for federated infrastructures.We then started developing tools that utilize the ontology, including converters from the various testbed resource description formats to OMN, inference rules for knowledge extension to complement the conversion process, and rule sets for semantic validation of the documents.We also developed standard queries that assist the testbed resource-matching algorithms in extracting needed information from the testbed resource descriptions.
The remainder of the paper is structured as follows.We give a brief overview of related work in the context of (federated) heterogeneous computing infrastructures in Section 2. In Section 3, we present the OMN ontology set.Section 4 shows how we extract information from RSpecs and annotate it using additional knowledge extraction from out-of-band information.Querying and validation using traditional semantic web stools are then performed by the tools built on this framework.Section 5 shows the performance and applicability of our tools.Finally, we close in Section 6 with conclusions, considerations, and a description of future work.

Related Work
Many application disciplines shifted the focus from tree-based data models (e.g., XML-based syntactic schemas) to semantic models.This change is reflected in development of ontologies to support, for example, the operation of Grids, Clouds, and now the Internet of Things.These efforts have informed our own OMN development.In the coming section, we provide an overview of these efforts.

Semantic Models For Grids, Clouds and IoT
In the context of Grid Computing, the Grid Laboratory for a Uniform Environment (GLUE) [11] schema was started 15 years ago to support interoperability between Grid projects by defining a schematic vocabulary with serializations to XML, LDAP, and SQL [12].A lack of formalism and a consequent inability to perform logical operations on data motivated the transition to Semantic Open Grid Service Architecture (S-OGSA) [13].
Semantic Web service discovery [14,15] addresses the automated discovery of Web services that satisfy given requirements.The discovery process uses a matchmaking algorithm to find potential Web services that might solve the problem at hand.Such methods, however, are inadequate to handle the complex interconnected computing infrastructures addressed by our work.Research on matching concentrates mainly on Web services [16], specifically, on semantic similarities between input and output parameters of various services.Our resource-matching involves more than matching available resources to the requirements of the end-user.We also need to identify homeomorphic embeddings of requested topologies within available resource topologies.The combination of such semantic and structural constraints leads to a substantially greater challenge.Pedrinaci et al. introduced Linked USDL (Linked USDL) [17], a vocabulary that applies research conducted on Semantic Web services to USDL [18,19].Linked USDL provides comprehensive means to describe services in support of automated processing.It focuses only on services, and is unsuited to the description of cloud infrastructures.Ontologies such as the Semantic Markup for Web Services (OWL-S) [20] or Good Relations (GR) [21], however, are of interest to our work, and are referenced in part in our ontology.
In the domain of Cloud Computing, researchers are working to ensure interoperability on a semantic level.Since 2008, work has progressed in the development of ontologies and of tools for semantic cloud computing [22][23][24].Haase et al. [25], for example, introduced an approach to administration of enterprise cloud environments, using semantic Web technologies.They proposed a Semantic Web-based product called eCloudManager, which incorporates an ontology to model its cloud data.However, the system and its ontology only focus on the management aspect of cloud systems, and the data are not open for usage.In another example, Haak et al. [26] proposed an ontology-based optimization methodology that enables cloud providers to detect the best resource set to satisfy a user's request.Their framework handles only a single administrative domain, whereas we seek to cover a distributed set of provider domains.
A paradigm shift is in progress in favor of Intercloud Computing.For instance, 20 approaches to this new challenge are presented in [27].Within this context, Manno et al. proposed the use of the semantic Federated Cloud Framework Architecture (FCFA) [28] to manage resource life cycles based on formal models.In contrast, the Intercloud architecture developed within the Institute of Electrical and Electronics Engineers (IEEE) Standard for Intercloud Interoperability and Federation (P2302) [29,30] Working Group uses graphs to describe and to discover cloud resources based on the existing Open-Source API and Platform for Multiple Clouds (mOSAIC) [31] ontology.Both approaches are being considered as domain-specific extensions to our work.In addition, Santana-Pérez et al. [32] proposed a scheduling algorithm that was suitable for federated hybrid cloud systems.The algorithm applies semantic techniques to scheduling and to matching tasks with the most suitable resources.The information model is based on the Unified Cloud Interface (UCI) project ontologies, which cover a wide range of details but which cannot handle Intercloud systems.Le and Kanagasabai [33,34] also proposed ontology-based methodologies to discover and to broker cloud services.They use Semantic Web technologies for user requirements and for cloud provider advertisements, and then apply an algorithm to match each requirement list to advertised resource units.Multiple levels of matching are defined, ranging from an exact match to no match.These methodologies concentrate only on Infrastructure as a Service (IaaS) provisioning.Moreover, they typically neither export their data nor provide a SPARQL Protocol And RDF Query Language (SPARQL) [35] endpoint, thereby hindering reuse of and access to data.
Interest has soared recently in the uses and challenges of the Internet of Things in which many heterogeneous devices from different administrative domains communicate with each other.Semantic models are needed for the IoT.The European Research Cluster on the Internet of Things (IERC) has established Activity Chain 4-Service Openness and Interoperability Issues/Semantic Interoperability (AC4) [36], and semantic models such as the Semantic Sensor Network (SSN) [37] ontology have been developed.Support for semantics in Machine-To-Machine Communication (M2M) [38] has received further attention [39].The primary applicable standardization activity from the European Telecommunications Standards Institute (ETSI) M2M Working Group has identified the need for a semantic resource descriptions in [40].The successor, oneM2M (http://onem2m.org)[41], already has established the OneM2M Working Group 5 Management, Abstraction and Semantics (MAS).With the recent establishment of the World Wide Web Consortium (W3C) Web of Things (WoT) [42] Working Group, semantic vocabularies will be developed to describe data and interaction models.

OMN Background
Development of our approach, the OMN ontology set, started within the Federation for FIRE (Fed4FIRE) [43] project.The aim was to extend and to harmonize related work for the FIRE initiative, which has been developed within the context of federated networks and e-infrastructures.Our main motivation was the state of the art in the Future Internet (FI) experimentation context, which considers only simple schema-based models.The Slice-based Federation Architecture (SFA) [44] is the de-facto standard Application Programming Interface (API) for testbed federation.It uses XML-based RSpecs to describe, to discover, and to manage resources in a simple declarative way.However, it cannot support complex queries combining structural and semantic constraints or knowledge analysis.OMN ontology design reuses concepts previously defined in RSpecs, but also leverages significant prior efforts to define ontologies targeting cyber-infrastructure management.
The Open Grid Forum (OGF) Network Mark-Up Language (NML) [45] is a well established ontology for modeling computer networks.It provides a framework for definition and description of topologies ranging from simple networks comprising a few nodes and connections to massive, complex networks with hundreds or thousands of nodes and links.The model underwent a thorough review and definition process, finally becoming an OGF standard.While NML lacks concepts and properties required to describe federated infrastructures, OMN adopts NML in order to model the networking aspects of the infrastructure.
In comparison with NML, the INDL addresses virtualization of resources and services.It supports description, discovery, modeling, composition, and monitoring of those resources and services.The INDL actually imports NML to describe attached computing infrastructures in a manner that is independent of technology and vendor.It offers the capacity to extend coverage to emerging network architectures.The INDL, however, does not support infrastructure federation, in which several different testbeds are interconnected experimentally.
Semantic models developed within the European NOVI and GEYSERS [46] projects have been used to describe federated infrastructures, user requests, policies, and monitoring information.They also support virtualization concepts for computing and networking devices.They have been adopted by OMN where their incorporation is appropriate.In the first project, the proposed Information Modeling Framework (IMF) [47] represents resources from the same or from different infrastructure providers.
In parallel to this, within the GENI initiative, the Network Description Language based on the Web Ontology Language (NDL-OWL) [48][49][50][51] model specifies capabilities to control and to manage complex networked testbed infrastructures.Lessons learned from live deployments of NDL-OWL in GENI proved informative in OMN modeling discussions.
Efforts related to describing APIs via OWL-S or DAML-S are not applicable directly to our problem, since they focus on the description of Web-service APIs.The testbeds in our research converged on a set of simple API calls like createSlice (requesting that a desired network topology be created) or listResources (requesting information about available infrastructure resources).The complexity lies in the parameters passed in those calls, not in the diversity of types of parameters serving as inputs or outputs to the APIs.Those parameters often are represented by XML documents describing requested or available testbed resource topologies.Our goal is to replace those syntactic-schema-based representations with semantic-web based views.We want them to include enough information to support native querying based on both structural and semantic constraints, either by the users or by testbed management algorithms.

Open-Multinet Ontology Set
Following Noy and McGuiness [52], the first step for defining a formal information model is to determine the specific domain and scope of the proposed ontology.As stated in Sections 1 and 2, the initial objective was to support resource management in federated infrastructures for experimentation.The related phases to this management effort are depicted in Figure 1.Each step embodies a wide range of requirements and challenges.We particularly highlight the first phase in this paper; however, our approach was to provide a hierarchical set of ontologies to cover the whole resource life-cycle.

Design
After identifying the scope of the reusable work (cf.Section 2), we have defined the significant concepts and properties.Consequently, our ontology bundle consists of nine ontologies, specifically the omn upper ontology and eight descendant ontologies (cf. Figure 2): omn-federation; omn-lifecycle; omn-resource; omn-component; omn-service; omn-monitoring; omn-policy; and domain-specific extensions called omn-domain-xxx.These ontologies can be used to describe formally a federation of e-infrastructures, including types and attributes of resources as well as services available within the federation.Ontologies also describe the life-cycle phases of usage.The various ontologies extend the upper OMN ontology (solid lines), which contains common concepts used within the other models.To describe concrete resources within a particular infrastructure, domain-specific ontologies might need to be defined that use and extend selected subsets of the OMN ontologies (dotted lines, see Section 3.1.9).

OMN Upper Ontology
The omn upper ontology defines the abstract terms required to describe federated infrastructures in general.
It includes a set of classes representing concepts providing general terms to model federated infrastructures, along with their respective components and services.These concepts are as follows: • Resource: a stand-alone component of the infrastructure that can be provisioned, i.e., granted to a user such as a network node.withinEnvironment: defines the "Environment" in which a Group, Resource, Service, or Component operates.An example of environment is the operating system under which a resource works.
To support rich querying and inferences, inverse counterparts have been declared for most properties.Figure 3 illustrates the key concepts and properties of the OMN ontology.

OMN Federation
A crucial part of the developed ontology set is the formal description of the relationship between the involved e-infrastructures (see Figure 4).This allows to describe how resources relate to each other from the highest organizational level and depicts the starting point to discover capabilities and offered services.Federated providers maintain their autonomy in making resource-allocation decisions; however, they inter-operate with some federation-level functions, such as identity management or resource advertisement.To model these aspects, the omn-federation ontology introduces the concepts of a Federation, FederationMember, and Infrastructure, along with properties hasFederationMember and isAdministeredBy.The first two are subclasses of the schema:Organization class, which allows them to be described by properties of the Schema vocabulary.The latter concept relates infrastructures to a federation or to its members, and finally subclasses the Group concept which allows infrastructures to expose services with endpoints, such as an SFA Aggregate Manager (AM).

OMN Life Cycle
Another important ontology is the omn-life cycle, which addresses life-cycle management of a collection of resources and services (e.g., a requested network topology) that are grouped together to perform a particular function (e.g., to conduct an experiment or to deploy a service architecture).The life-cycle of the resources is described by a set of allocation and operational state changes such as Allocated, Provisioned, Unallocated, Pending, Ready, Started, and Stopped.The life-cycle of the collection of resources reflects the first four phases of Figure 1: 1.
the infrastructure provider advertises an Offering describing the available resources; 2.
the user forms a Request defining the required collection of resources to the infrastructure provider; 3.
the Confirmation contains an agreement by the provider, termed bound (tied to a specific set of physical resources) or unbound, to provide the requested resources; 4.
and, finally, a Manifest describes the provisioned resources and their properties.
Each of these stages is represented as a subclass of the Group concept.

OMN Monitoring
It describes the main concepts and relations to support monitoring services within federated infrastructures.It includes, therefore, multiple classes and properties that define measurement Metrics, their Data, Unit, Tool, and further Generic-Concepts.The monitoring ontology therefore comprises an upper-level ontology.It describes the common, basic concepts and properties, which then are reused and specialized in the subjacent ontologies.

OMN Resource
The OMN Resource ontology deals with the networking aspect of the infrastructure.It supports the creation of complex networks of interconnected resources.It include concepts and properties, e.g., Node, Link, and IPAddress, which are required for defining complex networks.It also supports defining single or bi-directional links, which can be utilized for defining the direction of packet flow across the link(s).

OMN Component
This ontology describes any entity that is part of a Resource or a Service; however, in itself, it is not a Resource or a Service.The OMN Component ontology describes concepts that are subclasses of the Component class defined in the OMN Upper ontology.It covers several classes to describe a set of basic entities in any Information and Communication Technology (ICT) infrastructure such as CPU, and Memory.Any class or instance of these can be the range of the property hasComponent that has a Resource, Service, or even another Component as a domain.

OMN Service
This ontology deals with ICT services.Any entity that delivers a value for its user is considered by the OMN Service ontology as a service.Examples can be services that offer APIs or login capabilities such as SSH.This ontology includes a set of classes to describe those services being used in ICT infrastructures.The current version covers a set of services being used and implemented by OMN ontology within the context of the application area addressed in this paper, namely the FI experimentation.

OMN Policy
This ontology will cover policy-related concepts and relations.We consider the NOVI policy ontology as a starting point for its design, as it supports [10]: • Authorization policies that specify authorization rights of users within the federation.• Event-condition-action policies that enforce control and management actions upon certain events within the managed environment.

•
Role-based-access control policies that assign users to roles, with different permissions/usage priorities on resources.• Mission policies that define inter-platform obligations in a federation.
3.1.9.OMN Domain Specific OMN provides a way to define domain-specific ontologies, which customize the definition of concepts and relations for a particular ICT application.This allows a set of concepts and relations that are specific to a particular domain to be grouped along with some concepts and relations from other OMN ontologies to form a domain-specific ontology.Examples of these ontologies include, for instance, OMN Wireless ontology and OMN Cloud ontology, used to define the behavior of wireless networks and of cloud infrastructures, respectively.Another example includes the specification of an operating system (OS) version within a disk image, using omn-domain-pc:hasDiskimageVersion.

Use of Existing Ontologies
As described in Section 2 the OMN ontology set is inspired by and based on a number of existing formal information models.As an indicator, in Listing 1 a list of referenced vocabularies are shown that are used within the upper OMN ontology.An OMN Service, for example, has relationships to novi:Service, dctype:Service, gr:ProductOrService, service:Service, schema:Service, nml:Service, and owl-s:Service.

Implementation
We selected OWL2 to encode the OMN ontology suite due to its expressiveness, wide acceptance, and available tools.To ensure quality, changes to the ontologies are automatically checked using Apache Jena Eyeball inspectors; other validators such as the OntOlogy Pitfall Scanner (OOPS) [54] are executed manually.
As part of the design process we are taking steps to ensure the broadest possible dissemination of the ontologies.As a result, we are using Dublin Core (DC), Vocabulary for Annotating Vocabulary Descriptions (VANN), and Vocabulary of a Friend (VOAF) vocabularies to describe the associated meta information.We are publishing the files by following best practices (http://www.w3.org/TR/swbpvocab-pub/).The URL http://open-multinet.info/ontology/omn provides both a human-readable documentation and machine-readable serializations.We have registered the permanent identifier https: //w3id.org/omnand published the root ontology to the Linked Open Vocabulary (LOV) repository (http://lov.okfn.org/dataset/lov/vocabs/omn).Additionally, we have registered the omn name space (http://prefix.cc/omn).The source code of, and an issue tracker for, the ontologies are publicly available (https://github.com/w3c/omn).
In order to make the work recognizable to the international community, we established the Open-Multinet Forum, which is named after the ontology, and created the W3C OMN Community Group (https://www.w3.org/community/omn).

DBcloud Application For Federated Experimental Infrastructures
Most of the requirements for the development of OMN are rooted in research issues within the life-cycle management of resources across federated experimental infrastructures.In such a distributed environment, resource discovery is highly constrained, as it is based on (multi-) attribute matching.It requires an increased level of coordination between users and infrastructure providers as well as among infrastructure providers in the federation.For this purpose, we essentially propose a federation-wide knowledge layer over the federated infrastructures to support semantic representation of such information and to facilitate semantic-based resource discovery.
A large amount of semistructured information is available describing the GENI and FIRE testbed federations, including details about the testbeds involved and about the heterogeneous resources offered, reservation information, and monitoring data.This information is encoded mainly as human-readable text on websites as well as in the forms of JSON and XML trees via secured API calls.To extract this information and to make it semantically accessible on the Web, we previously introduced the OMN extraction framework [4].
In essence, the OMN extraction framework (Figure 5) follows the design of the DBpedia extraction framework [55].Information is retrieved from the infrastructures, calling periodically according to methods of the SFA AM API (http://groups.geni.net/geni/wiki/GAPI_AM_API_V3_DETAILS).The downloaded documents are translated into a semantically annotated Resource Description Framework (RDF) [56] graph using the OMN translator and the OMN ontology suite.To extend the knowledge encoded in this graph, the Apache Jena inference engine is used within this process by applying infrastructure-specific rules (Section 4.2).
Finally, the resulting knowledge graph is written in an in-memory triplet database (Sesame v.2.8.6) and in a Turtle (TTL) [57] serialized file (DBcloud Dump).A SPARQL endpoint on top of the triplet data store implements a federation-wide lookup service that enables resource discovery by end-users.The result is currently available at http://lod.fed4fire.euusing, among others, the Vocabulary of Interlinked Datasets.The knowledge base currently describes approximately 100 aggregates, 3000 nodes, 30,000 links, and about 25,000 interfaces.This consists of 4.1 million statements, with the potential to grow substantially as new testbeds join the federation.
The OMN translator is a Java-based extensible translation mechanism introduced in [2], allowing the automated transformation of semi-structured data into an OMN based knowledge graph.It translates statelessly between GENI, Resource Specifications (RSpecs), and OMN; applies inferencing rules for validation and knowledge injection; and has been extended to support Topology and Orchestration Specification for Cloud Applications (TOSCA) [58] and Yet Another Next Generation (YANG) [59]   The implementation of the translation tool follows a Test Driven Development (TDD) approach, is included in a Continuous Integration (CI) environment with test coverage analytics, and is offered as a Java-based open-source library ("omnlib") in a public maven repository.It uses the Java Architecture for XML Binding (JAXB) and Apache Jena to map between XML, RDF, and Java objects.It supports a number of APIs: (i) a native API to be included in other Java projects; (ii) a CLI to be used within other applications; and (iii) a REST-based API to run as a Web service.
The OMN translator parses the XML tree and converts the tags and attributes to their corresponding classes or properties.To give a better understanding of this translation process, we provide an illustrative example for the conversion of a GENI Advertisement RSpec used to publish available resources within a federation of experimental infrastructures.
The example in Listing 2 shows a single node of type PC that can provision the sliver type PLAB-VSERVER (virtual server for PlanetLab).Traditionally, hardware type and sliver type used to be simple strings, but unique Uniform Resource Identifiers (URIs) are used here to provide machine-interpretable information.
Listing 2: RSpec Advertisement (excerpt) Listing 3 shows the converted graph, serialized in Turtle.The overall approach is to define an omn:Topology (the subclass omn-lifecycle:Offering is used in this case) that contains pointers to the offered resources.Each resource is an individual of a specific type that can implement (i.e., can provision) one or more specific sliver types.

Knowledge Extension and Information Querying
Having described the framework for semantic-based resource discovery in the context of federated experimental infrastructures, we will now focus on the specifics of the discovery process.Given a user request (query) and the aforementioned knowledge base, the resource-discovery problem amounts to automatically finding the resources from the triplet data store that match the query requirements along with policies set by infrastructure providers, since a request can be expressed at different levels of abstraction (Resource Matching).The adoption of the OMN ontology suite provides the necessary flexibility of expression as well as tools for querying and inference that simplify the typical problems encountered in the process of resource matching.Rules can capture domain background knowledge or infer resource requirements from the request model; specifically regarding the latter these are added as additional information to the initial request model.In addition, they can be used to check the request model's validity [49].These benefits are highlighted in the following text.

Knowledge Extension
Background knowledge captures additional knowledge about the domain.This information can be used in matching a request with available resources.Knowledge is expressed in terms of rules that use the vocabulary of the ontology to add axioms.The knowledge graph can be extended by applying such rules.
For example, infrastructure providers in the federation do not advertise explicitly the hardware configurations of their resources in the RSpec XML documents provided.Such data are not translated into RDF.Instead, the information is encoded in each resource's hardware type, arbitrarily set by the infrastructure provider as highlighted in the advertisement excerpt provided in Listing 2, (i.e., hardware type: PC).
In Table 1, we provide sample hardware specifications for a subset of the federated experimental infrastructures as they are described by the corresponding infrastructure providers, namely, NETMODE (http://www.netmode.ntua.gr/testbed)and Virtual Wall 2 (http://doc.ilabt.iminds.be/ilabt-documentation/virtualwallfacility.html)testbeds.Figure 6 depicts a rudimentary offering (advertisement) excerpt from the NETMODE infrastructure provide.For the sake of readability, only a single advertised resource is depicted (omf.netmode.node1).Moreover, the diagram does not show all the details of the resource description, although it identifies the distinct OMN ontologies used for this purpose, in the upper part the figure.In the excerpt provided the offered resource omf.netmode.node1 is managedBy the infrastructure provider omf:netmode (AMService) and is part of (isResourceOf ) the offering (advertisement) identified by urn:uuid:c9c34c9c-08d6-4dc6-91e2-2e5fac9dd418.The resource is related via the object property hasHardwareType to the HardwareType individual with the label alix3d2.It is associated (hasSliverType) to the SliverType individual, with the label miniPC, attributed with specific Disk Image properties (e.g., OS Voyage).As noted in this example, infrastructures advertise node capacities by their hardware type name (alix3d2 in this case).A simple example of background knowledge on the context of the "hardware type" is provided in Listing 4. The listing represents a subset of the rules used to expand the knowledge base with CPU-related information regarding pcgen3 nodes listed in Table 1.Such information can be used in the resource matchmaking process.In the specific application, it is the responsibility of the federator, which maintains/provides the extraction framework, to apply such rules.
Listing 4: Infrastructure knowledge 1 (excerpt) For every compute node with a hardware type that has a label matching "pcgen0?3.* Insert standard information about this node type: CPU Type, Core Count, CPU Frequency Link new information to the compute node In our second example, shown in Listing 5, rules 1 to 3 mandate that each node identified by hardware type alix3d2 have the hardware capacity described in Table 1 in terms of CPU, memory, and storage.Rules 4 to 6 link this information to the node.

Information Querying
Having applied the rules in Listing 4, a user may make a request for cloud resources with, for example, specific CPU requirements.In the sample SPARQL query provided in Listing 6, the user submits a request for two virtual machines with a specific number of CPU cores and OS type, e.g., Fedora:6cores.The results are shown in Listing 7.

Just one answer, please
Listing 7: Query results 1 RESULTS 2 urn:publicid:IDN+wall2.ilabt.iminds.be+node+n095−05a 3 urn:publicid:IDN+wall2.ilabt.iminds.be+node+n096−02 4 TIME EXECUTION: 0.016sec In a more complex example, a user may submit a request for two nodes running a Linux distribution, with specific hardware requirements; e.g., 256MB of RAM and storage capacity greater than 500 MB.The query is described in Listing 8.The resource-matching process is not straightforward, as it was in the previous case, even if we apply the rules in Listing 5.In most cases, Infrastructure Providers advertise the exact Linux Distribution (e.g., Voyage in Figure 6).Thus, the condition for Linux OS variant needs to be either incorporated into the request requirements or advertised explicitly by the testbeds.We follow the first approach in this case; additional rules are added to infer automatically the resource characteristics, e.g., acceptable Linux distribution, without explicit statements needed from the user, as proposed in [60].The rule set is an appropriately defined set of axioms from which additional implicit information can be derived.A sample rule used is provided in Listing 9 stating Linux compatibility (Voyage is a Linux-variant OS).
Listing 8: Initial SPARQL query 2  Once the rules are applied, OR-AND clauses are built and added to the initial request [60].Given the additional information injected into the graph, Listing 10 shows the new, expanded SPARQL query, with OR-AND clauses included in Lines 22-25.The results are restricted to one feasible matching solution, which is shown in Listing 11.

Validation
Documents created using OMN vocabularies can be validated semantically in part by using traditional OWL entailments, which verify that the domains and ranges of properties used in a particular model match those defined in the vocabulary.We found, however, that the expressivity of those mechanisms was not always sufficient to validate the user requests being sent to the testbed.Procedural verification is not portable.It is hard to ensure correctness and consistency across implementations.To supplant traditional OWL mechanisms, we developed Datalog rule-sets that trigger inference errors when processing a document that either lacks specific information or is semantically ambiguous.In this section, we explore several examples of such rules.
For instance, if a user is attempting to request a network connection that loops to the same node on which it started, a request may be represented by a valid OMN model; however, semantically, it doesn't make sense to the resource-matching algorithm that is attempting to reproduce the topology.To guard against cases like this, we validate the user's request using the following Datalog rule in Listing 12.
Listing 12: Validating self-looping links in requests 1 (?Z rb:violation error('Connection Validation', 'Connection cannot loop on itself', ?Y)) 2 <− (?X rdf:type pc:PC), (?X nml:hasOutboundPort ?P1), (?X nml:hasInboundPort ?P2), 3 (?Y rdf:type nml:Link), (?P1 nml:isSink ?Y), (?P2 nml:isSource ?Y) ] In some requests by end-users, every Virtual Machine (VM) node must specify an OS image to be booted.At the same time, a VM -server node does not need an image, since it operates using only a pre-determined image.The pc : hasDiskImage property is defined for all PC types, including VM Servers and VMs, so a cardinality restriction cannot be used in this case.This request validation rule is expressed as follows in Listing 13.
Listing 13: Validating presence of OS image in VM requests 1 (?Z rb:violation error("Validating that VM nodes have OS images", ?R)) <− (?R rdf:type pc:VM), 2 noValue(?R, pc:hasDiskImage, ?I) It is important to emphasize that the set of the rules that we use continues to evolve with the schema and with the resource-matching algorithms used to allocate CI resources for the users.For example, as the algorithms become more sophisticated, they are able to function without some of the guards protecting them from poorly formed requests, reducing the need for some rules.Nonetheless, the designing of resource-matching and of embedding algorithms in testbeds is an active field of study.The availability of declarative rule-based semantic validation significantly simplifies the continuing evolution of these algorithms by clearly associating a particular algorithm with its own set of validation rules that prevent errant executions and simplify the algorithm code.

Performance Evaluation
By adopting formal information models and semantically annotated graphs, our approach allows operations to link, relate, enhance, query, and conduct logical manipulations of heterogeneous data, all of which would be impossible otherwise.
One of the most important measure for the applicability of our work is the amount of time required to translate and query resources using our ontology.This time needs to range in a practicable span for the given context.Our initial work [4] looked at the sizes of the advertisements for testbeds in the FIRE and GENI projects and evaluated the performance of the translation to RDF of the respective XML files.The novel work we present in this paper show a more comprehensive comparison of the queries performance; namely, we look at the time needed to translate resource information to the one needed to list resources, as well as the performance of queries of different complexity.
We have analyzed the result of the ListResources method call of the 99 SFA AMs that are monitored (https://flsmonitor.fed4fire.eu/)within the Fed4FIRE project.This list contains 82 valid XML based GENI RSpec replies with 762.634XML elements in total, of which 3.043 are Nodes, 31.155 are Links, and 25.493 Interfaces. Figure 7 shows the side of the RSpec advertisements in the testbeds we considered.To estimate the time needed to translate the advertisements, the actual RSpecs from these testbeds has been downloaded.The XML files were then translated to TTL serialized RDF graphs using the OMN translator.Of great importance to the potential scalability of our approach is the time taken for such translations, particularly with regard to the number of XML elements involved.100 Advertisement RSpecs had been extracted, of which six contained errors, e.g., not adhering to the RSpec XML Schema Definition (XSD) file, and could not be translated without manual changes.Tests were run on a MacBookPro with OS X Yosemite, a 2.8 GHz Intel Core i7 processor, and 8 GB of RAM.Running a translation over all correct RSpecs produced median values of 24 milliseconds from XML to Java Architecture for XML Binding (JAXB) and 20 milliseconds from JAXB to RDF, yielding a total median translation time of 44 milliseconds from XML to RDF.As shown in Figure 8, translation times appear to be roughly linearly correlated with the number of XML elements translated, with a median of 180 elements and a maximum of 159,372 translated.This linear correlation indicates upwards scaling should be possible, although more data are required to confirm this point.At this stage, no major limiting factors have been identified, and, given appropriate processing power, translation should be possible in most foreseeable use cases.To put the duration needed for the translation of an RSpec Advertisement into relation with the duration of the underlying function call needed in the FI experimentation context, we quantified the query and translation time for a single testbed.As indicated in Figure 7, about 95% of the testbeds expose fewer than 20.000 XML elements; therefore, we have used the CloudLab Wisconsin testbed (https://www.cloudlab.us),which exposes 19.371, for our measurements.The results in Figure 9 show that the average translation time of 583 ms ± 9 ms (95% CI) would add about 10% to the average response time of 5453 ms ± 131 ms (95% CI).This effect, however, could be mitigated by translating in advance or by distributing the work load.The delay of over 5 seconds for listing resources using a single API call, is influenced by mainly two factors.First, the available bandwidth to transmit the resulting XML document from the testbed to the caller.Second, the testbed internal communication architecture to gather the required information, as CloudLab is a distributed infrastructure itself that is composed by three different sites.q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Listing Resources Translating Assuming that a testbed accepts the potentially enhanced response time, in favor of the added value of merging its information into a global linked data set, its resources can be found by applying the aforementioned resource-matching queries.The translation of all available tree data structures into an RDF-based graph, using our OMN vocabulary and rules, resulted in a set of 2.911.372statements.It builds the basis for our conclusion that adding further rules, infrastructures, and other data sources will increase the potential for significant growth.
In Figure 10, the duration of Listing 10 against this graph is shown.To assess the performance impact of the complexity of the query, it has been compared with a simpler one, which is shown in Listing 14 together with its result in Listing 15.While finding the three largest aggregates took on average 129 ms ± 3 ms (95% CI), the matching query took on average 168 ms ± 1 ms (95% CI) and therefore took about 30% longer, yet much less time than a single ListResources call in a single testbed.Finally, we have summarized our findings in Table 2. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q matching resources finding largest aggregates

Conclusion and Future Work
The OMN set of ontologies that we presented in this article has been developed to support resource management in federated and distributed computing infrastructures.OMN provides a federation-wide knowledge layer that eases the process of resource selection and matching.
In this article, we described the OMN framework, which allows the extraction of underlying information from tree-based data structures.It exposes this information in the form of OMN triples to interested parties via the Web.DBcloud is an application developed in support of the federation of experimental cyber-infrastructures which relies on OMN translators that automatically transform semi-structured data into OMN graphs.An important aspect that we have assessed is the performance of such translations, as this is crucial to OMN usability and adoption.We have shown that the translation and query times require additional time (on the order of 10% in our experiment), which, however, we expect to be acceptable to all resource providers given the added value of merging information.
We have also shown how users can query OMN information that represents the resources available in the underlying infrastructures and match them with their own computational requirements.In such case, we evaluated the time needed to find matching resources.We have shown that more complex queries complete within times that are acceptable to end-users.
In the long run, we expect that our contributions will outlive the specific use case of the cloud testbed resource management.We believe that it will be accepted by the broader community of academic and commercial cloud providers.It will help to create an ecosystem of flexible, extensible tools and mechanisms that will see the use of cloud platforms become even more pervasive.We expect it to open up the marketplace to competing cloud providers, large and small, catering to specific market niches.We are also promoting adoption of OMN in new domains such as the Internet of Things (IoT).As a specific Industrial Internet of Things (IIoT) [61] example, things, services and data can be connected between federated manufacturing facilities.As an analog to the federation of testbeds the involved facilities, the digital factories, their available APIs and services, have to be described formally to allow for matchmaking capabilities required for the envisioned autonomous production within the fourth industrial revolution.Our ontology set could act as a basis.Following discussions within the German initiative Plattform Industrie 4.0 (PI4.0),linking information based on the LOD paradigm and using interfaces such as the W3C WoT could build a technological base for implementing this vision.Another focus area for the OMN ontologies is the integration with ontologies defining data-access policies among cooperating entities that make use of the cloud infrastructures.The support provided by OMN for the definition of complex usage of heterogeneous resources will be the backbone for novel kinds of open data services, both in industrial and commercial settings, as well as in the scientific community.

Figure 1 .
Figure 1.The experiment life-cycle phases and protocols.

PFigure 3 .
Figure 3.The key concepts and properties of the omn upper ontology.

Figure 8 .
Figure 8. JAXB to RDF translation times versus number of XML elements [4].
It may define, for example, an order in which particular resources need to be instantiated: first, a network link, and then, the compute nodes attached to it.This class opens up the possibility of adding more properties to a dependency via annotation.•Layer:describesa place within a hierarchy to which a specific Group, Resource, Service, or Component can adapt.Infrastructure resources naturally fall into layers, with resources at higher layers requiring presence of resources at lower layers in order to function.The OMN upper ontology has 23 properties, of which the following are the most significant: • hasAttribute: the Attribute associated with a Component, Resource, Service, or Group; e.g., CPU speed, or uptime.• hasComponent: links a Component , Resource, or Service to its subcomponent.• hasGroup: connects a Group to its subgroup; it is the inverse of isGroupOf.• hasReservation: relates Group, Resource or Service to its Reservation.• hasResource: declares that a specific Group has a Resource.• hasService: declares that a Group, Resource or Service provides a Service.• [53]rvice: is a manageable entity that can be controlled and/or used via either APIs or capabilities that it supports, such as a SSH login.•Component:constitutes a part of a Resource or a Service, such as a port of a network node.•Attribute:helps to describe the characteristics and properties of a specific Resource, Group, Resource, or Component, such as Quality of Service (QoS).•Group: is a collection of resources and services, for instance, a testbed or a requested network topology logically grouped together to perform a particular function.•Dependency:describesa unidirectional relationship between two elements such as Resource, Service, Component, or Group.•Environment: the conditions under which a Resource, Group, or Service is operating, as in, e.g., concurrent virtual machines.•Reservation:aspecification of a guarantee for a certain duration.Hence, it is a subclass of the "Interval" class of the W3C Time ontology[53].
data models as well.

Find me two hosts, node1 and node2 Both with RAM greater than 256 MB Both with disk storage greater than 500 MB Running Linux Variant
Listing 14: Finding the largest aggregate via query 1 SELECT (COUNT(?am) as ?fre) ?amWHERE { 2 ?node omn−lifecycle:managedBy ?am .3 } GROUP BY (?am) ORDER BY DESC (?fre) LIMIT 3

Table 2 .
Results of the performance evaluation.