Experiment 1: 1134 Thing Descriptions where registered in eWoT without their corresponding WoT-Mappings, restricting the discovery to only context-based search. At the same time, the very same Thing Descriptions were inserted in an external GraphDB triple store. Then, a set of 20 queries were issued to both eWoT and the external GraphDB.
On the one hand, this experiment aimed at validating whether the query answer produced by eWoT and GraphDB were the same, meaning that the queries answered by eWoT are complete and correct. On the other hand, the response times of eWoT and GraphDB were compared to ensure there were no statistically significant differences between them, meaning that eWoT is as efficient as a triple store when answering queries.
Experiment 2: an incremental number of IoT devices were registered in eWoT up to 1000; for each, a set of 20 queries were issued. This experiment relied on a simulator that creates a variable number of digital twins for smart houses (i.e., IoT devices) that on the one hand register their TD and WoT-Mappings in eWoT and, on the other hand, enable a REST API that publishes JSON data.
On the one hand, the scalability of eWoT was statistically analysed by simulating 100, 250, 500, 750, and 1000 smart houses. On the other hand, assuming there is no overhead in the transmission between eWoT and the IoT devices, the time that requires the distributed access and the translation of data was analysed in order to verify the amount of overhead that both introduce.
an incremental number of real-world IoT devices (i.e., photometer data published by the European project Stars4All (https://stars4all.eu/
)) were registered in eWoT up to 400; for each, a set of 20 queries was issued. This experiment aimed at comparing the eWoT proposal with a custom centralised approach. The centralised approach consisted of a triple store (i.e., a GraphDB) that stored the TDs of the photometers and a developed custom service that read their values, translated them into RDF, and injected them in GraphDB.
all the experiments where run in an Ubuntu GNU/Linux x86_64 with four cores and 34 GB of RAM. In this computer, we also simulated all the RESTful endpoints to reduce the network impact when measuring the time taken by the different data exchanges in our experiments. The simulator was implemented using Java 1.8, Spring Boot 2.1.5. For the translation of JSON documents into RDF we used the Helio library (https://helio.linkeddata.es/
). To implement our Repository we relied on a GraphDB 8.7.2.
the implementation of eWoT is publicly available at https://github.com/oeg-upm/eWoT
. The simulator used in the experiments, and its manual, can be found under the folder MDPI-experiments
in the very same repository, where it is specified how to reproduce Experiment 2. Unfortunately, due to confidentiality, the Thing Descriptions involved in Experiment 1 cannot be disclosed publicly. The results obtained in our experiments are publicly available at https://zenodo.org/record/3634897
as a Zenodo repository.
The query-answering time for both approaches was evaluated by registering 100, 200, 300, and 400 photometers. The scalability in a real-world scenario was analysed for eWoT. In addition, a comparison between both proposals was performed, analysing their pros and cons.
5.1. Experiment 1: Discovery
This experiment aimed at testing the Discovery component of eWoT, which implements the functionality reported by Algorithm 1. The goal of this experiment was to validate that eWoT produces complete and correct query answers, and is as efficient as a triple store when answering a discovery query that only performs a context-based search.
For this purpose, a set of 20 different queries with different shapes [38
] were developed, namely: four linear queries, four star queries, four tree queries, two cycle queries, and six complex queries. The queries with the same shape had an incremental size. For example, the first linear query had four triples whereas the fourth had eight triples. The experiment consisted of running all the queries, ten times each, over GraphDB and over the SPARQL endpoint of eWoT. Every time a query was issued the response time and the content of the query answer were stored. Bear in mind that eWoT computes a TED of suitable IoT devices before answering the query, and therefore introduces additional operations over the query-answering process.
To carry out the experiment, 1134 Thing Descriptions related to IoT devices from the European project VICINITY were loaded into a GraphDB (previously empty) and eWoT, as depicted by Figure 4
. These Thing Descriptions profile a wide range of real-world heterogeneous IoT devices. Note that the Thing Description contained no WoT-Mappings, as this experiment aimed at testing only the discovery. Due to confidentiality policies, the Thing Descriptions involved cannot be disclosed publicly.
displays the results of this experiment. The column Query contains the shapes of queries issued and the other two columns (i.e., GraphDB and eWoT) recap the results obtained. The sub-column Answer size reports the number of lines contained in the CSV query answer, and the sub-column Avg. Time (s) reports the average query-answering time in seconds.
To ensure the correctness and completeness of the eWoT Discovery the query answers produced for the same query by either eWoT and GraphDB were compared. If the size (the number of lines in this case) of the two answers was the same, then eWoT could be considered complete. When the content of both answers was the same, then eWoT could be considered correct. Bear in mind that the query answers in SPARQL hold no order, and thus two answers for the same query may expose the same data sorted differently. In addition, this claim assumes that GraphDB produces complete and correct query answers.
the completeness was validated by manually comparing the number of lines of all the query answers, and verifying that it was the same for those answers produced by eWoT and those by GraphDB. Although this task was manually verified, the sizes of the query answers are included in Table 2
in order to provide the reader with an idea of the query answer sizes.
the correctness was validated by manually comparing the content of all the query answers, and verifying that it was the same for those generated by eWoT and by GraphDB. Since the correctness is related with the content of the query answer, there was no suitable way to reflect this in Table 2
Discovery Efficiency:Table 2
reports the average query-answering times (sub-column Avg. Time (s)) for GraphDB and eWoT. The best effort was made to avoid query cache mechanisms in GraphDB. In the light of these results, the answering times seemed to be almost the same. In order to validate that the eWoT query-answering time was equivalent to that of GraphDB, a statistical significance test was performed. The well-known Iman–Davenport test [39
] was applied to check if there was a statistically significant difference between GraphDB and eWoT answering times, using a confidence level of 95%.
The result of the Iman–Davenport test was a p-value of approximately 0.07. Since this value is above the established significance threshold (i.e., 0.05), it can be concluded that the query-answering time of eWoT was not significantly different from the GraphDB time. Therefore, it can be concluded from an efficiency point of view that the eWoT Discovery task was as efficient as a generic triple store such as GraphDB.
5.2. Experiment 2: Discovery, Distributed Access, and Translation
This experiment was aimed at testing the whole eWoT architecture, namely: Discovery, Distributed Access, and Translation. The goal of the experiment was to analyse the scalability of eWoT for an increasing number of IoT devices, and to analyse the overhead that the distributed access and translation introduce in the query-answering process.
For this purpose, a new set of 20 queries with different shapes was defined, namely: four linear queries, four star queries, four tree queries, and eight complex queries. The experiment consisted of running all the queries, ten times each, over GraphDB and over the SPARQL endpoint of eWoT. Every time a query was issued, three different times were stored: discovery time, access time plus translation, and the total answering time.
To carry out the experiment, a simulator of IoT devices was used. The simulator received as input the number of devices to be simulated and the registration endpoint of eWoT. Then, the simulator deployed that number of REST API endpoints, publishing different data in different http endpoints. Then, the simulator registered for each deployed REST API a Thing Description and its WoT-Mappings in eWoT. The Thing Descriptions registered were complete and complex, in contrast to those of Experiment 1, which were heterogeneous. The number of endpoints deployed for this experiment with the simulator were: 100, 250, 500, 750, and 1000.
IoT devices simulator:
we implemented a simulator to simulate the different IoT devices that are discoverable and accessible in our ecosystem. The simulator took as input a CSV file containing the data of a smart house, extracted from the Machine Learning Repository of UC Irvine (https://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power// +consumption
), and the endpoint address of a description repository to register its TDs. When the simulator was started it registered itself into the repository (so our proposal could discover and access it), and also published a RESTful API whose data was updated each minute (following the CSV rows); in this way, the published data was the historical data of a smart house. Then, we developed a script that takes a number as input and starts as many simulators as specified in the input number; each in a different port.
shows, relying on a whisker plot, the time taken to answer the different queries when different numbers of endpoints were available.
in light of the results depicted in Figure 5
, several things can be concluded: (a) eWoT had a stable query answering time, that is, the response times were very similar (the boxes are very narrow); (b) eWoT took on average less than 20 s to solve queries that required accessing 240 endpoints, about 35 s to solve queries that required accessing 500, and 75 s for queries requiring that 1000 endpoints be accessed.
the whisker plots of Figure 5
are not conclusive for establishing how eWoT scales. Therefore, a linear regression model [40
] was applied with a confidence level of 95%. All the query- answering times shown in Figure 5
were used to feed the linear regression model. As a result, the query-answering times adjusted the model with a p
-value less than
. In order to be a relevant result, its p
-value should be below 0.05 due to the confidence level of 95%. Therefore, it can be concluded that eWoT scaled linearly since its results fit a linear regression model when the number of endpoints grew. This linear behaviour can be observed thanks to the blue line depicted in Figure 5
, which corresponds to the regression line adjusted to the eWoT results.
Distributed Access and Translation overhead: Finally, as a last experiment we aimed at establishing how much overhead was introduced by the access regarding the discovery. Bear in mind that our experimental environment was located in the same machine, and thus the remote accesses were in our local host, meaning that the time to retrieve data tended to zero and the access time was only the time that our proposal took to fetch, translate the data into RDF, combine it with the discovery RDF data, and finally answer the query.
reports the averaged percentage times that took answering the different queries, the table distinguishes the discovery and the access time percentages. Two main points should be noticed; the first is that the discovery and access percentages for all the different sizes of endpoints were almost the same (i.e., 96% for discovery and 4% for access), meaning that the linearity of our approach was preserved in both discovery and access. The second is that the discovery took most of the time whereas the access was nearly instant. This behaviour has two reasons: first, the network time in our experiment was almost zero, and second, the access is a task that can be parallelised whereas the discovery is not.
In light of the results shown in Table 3
, we can conclude that the distributed access brings no overhead to the whole process of query answering, considering that the time to retrieve through the network tended to zero.
5.3. Experiment 3: eWoT vs. Centralised Approach
This experiment was aimed at comparing the query-answering time of eWoT and a centralised proposal, the integration efforts required by both, and how they behave. The goal of the experiment was to analyse the pros and cons of the eWoT approach and the well-known centralised approach from the literature.
For this purpose, a new set of 20 queries with different shapes was defined, namely: four linear queries, four star queries, four tree queries, and eight complex queries. The experiment consisted of issuing all the queries, ten times each, over the centralised proposal and eWoT. Every time a query was issued, the query-answering time was kept for further analysis.
Due to the lack of availability of centralised proposals in the literature, or due to hard-restrictions that made the use of some proposals unfeasible, we developed a custom proposal for the purposes of this experiment. The proposal was developed following the centralised approach reported by Zhou et al. [9
The centralised proposal consisted of a GraphDB that stores the TDs of the IoT devices, and several services that monitor the API of the IoT devices being integrated. These services periodically pull data from the IoT devices, transform such data into RDF, and correctly inject the values to the corresponding TDs. In order to carry out this experiment the IoT devices involved were the ones published by the Stars4ALL European project. This project publishes in a REST API the real-time data of a large number of photometers distributed across the world (https://github.com/STARS4ALL
shows, using a whisker plot, the time taken to answer the different queries. Figure 6
a reports the time required by eWoT, whereas Figure 6
b reports the time required by GraphDB. Bear in mind that the charts have different scales in their y
observing the results reported by Figure 6
, the centralised proposal based on GraphDB clearly outperformed the query-answering time of eWoT. This is because when a query was issued in this proposal, since the IoT data were already present in the triple store, the query was directly answered using the GraphDB SPARQL engine. In contrast, when a query was issued to eWoT, first a discovery task was performed and then all the IoT devices relevant to answer such query were accessed, their data retrieved and translated to RDF, and finally the query answer was computed.
Data freshness: the values reported by the IoT devices change constantly. Since the queries issued in this experiment involved such values, answering them with the latest reported values is paramount. eWoT retrieves their values anytime a query is issued, and therefore the freshness of data is guaranteed. In contrast, the centralised proposal relies on several services that monitor the IoT devices and periodically inject their values into GraphDB. As a result, the freshness of the IoT devices’ data cannot be guaranteed.
Integration effort: overcoming IoT devices’ heterogeneity and integrating new ones is a challenging task. eWoT address this task by requiring the registration of TDs containing WoT-Mappings any time a new IoT device must be integrated. This entails code development, providing users with a plug-and-play system in which an IoT device becomes automatically interoperable just by providing the complete TD. The centralised proposal addresses this task by requiring a practitioner to develop a service that monitors the IoT device and periodically uploads its value into the triple store.
In light of the results reported in Figure 6
and the previous analysis, several conclusions can be reached: (a) the centralised proposal was faster than eWoT answering the queries, but (b) the centralised proposal cannot guarantee that such queries are answered using the latest values of the IoT devices. Instead, eWoT always guarantees that the queries are answered with their latest values. (c) In addition, integrating new IoT devices in the centralised proposal requires the development of code, or code modification, whereas eWoT only requires the registration of a TD.