1. Introduction
Sensor networks are gradually becoming ubiquitous in the industrial world [
1], whilst they are estimated to drive the formation of the future Internet, by 2015 [
2]. The value of the Sensor Web is related to the capacity to aggregate, analyse and interpret this new source of knowledge. Currently, there is a lack of systems designed to manage rapidly changing information at the semantic level [
3]. The solution given by data-stream management systems (DSMS) is limited mainly by the incapacity to perform complex reasoning tasks, and semantic technologies are seen as the main technical instrumentation available to deal with the current problems [
4].
Stream reasoning is defined as real time logical reasoning on huge, possible infinite, noisy data streams, aiming to support the decision process of large numbers of concurrent querying agents. In order to handle blocking operators on infinite streams (like min, mean, average, sort), the reasoning process is restricted to a certain window of concern within the stream, whilst the previous information is discarded [
5]. This strategy is applicable only for situations where recent data have higher relevance (e.g., average water debit in the last 10 minutes). In some reasoning tasks, tuples in different streams that are far apart need to be joined arbitrarily.
Stream reasoning adopts the continuous processing model, where reasoning goals are continuously evaluated against a dynamic knowledge base. This leads to the concept of transient queries, opposite to the persistent queries in a database.
Typical applications of stream reasoning are: traffic monitoring, urban computing, patient monitoring, weather monitoring from satellite data, monitoring financial transactions [
3] or stock market. Real time events analysis is conducted in domains like seismic incidents, flu outbreaks, or tsunami alert based on a wide range of sensor networks starting from the RFID (Radio-Frequency Identification) technology to the Twitter data flow [
6]. Decisions should be taken based on plausible events. Waiting to have complete confirmation of an event might be too risky.
To summarise, the problem is given by the inability of the current stream reasoning systems (i) to perform semantic reasoning on changing information from heterogeneous sensors and (ii) to support decision in case of incomplete information.
Our solution is based on three technical components: description logic (DL), plausible logic (PL) and functional programming. Description logic is used to deal with heterogeneity and to fill the gap between low level sensor data and high level knowledge requested by decision takers. Plausible Logic, being non-monotonic, is used for handling incomplete information and its explicit priorities are used to solve inconsistencies. The Haskell data structures are used to model infinite streams based on the lazy evaluation mechanism of Haskell. Also, functional programming is used to process streams on the fly according to given aggregation policies.
Note that in the current approach, the rule-based system is not on top of the ontology level, as envisaged by the Semantic Web stack. Instead, they are at the same level with part of the ontology being translated into strict rules. Consequently, after the translation, all the reasoning on streams is performed within the plausible logic framework. The main advantage rests in the time needed, given the superior efficiency of plausible logic [
7] compared to description logic [
8].
This paper is an extended version of [
9]. The first contribution regards the integration of plausible rules with description logic. As a second contribution, we describe the use of Haskell data structures and the lazy evaluation mechanism for continuous reasoning on infinite streams.
We exploit non-monotonic rule-based systems for handling inconsistent or incomplete information and also ontologies to deal with heterogeneity. Data is aggregated from distributed streams in real time, and plausible rules fire when new data is available. This study investigates the advantages of lazy evaluation on data streams as well.
Section 2 evidentiates distinctive features of our approach compared to related work.
Section 3 introduces the technical instrumentation, which is based on plausible logic and description logic.
Section 4 presents the developed stream management system. A running scenario is illustrated in
Section 5. Finally,
Section 6 summarizes the advantages and possible improvements of our solution.
2. Related Work
Stream integration is considered an ongoing challenge for the stream management systems [
2,
3,
10,
11], with several tools available to perform stream reasoning.
DyKnow [
12] introduces the knowledge processing language KPL to specify knowledge processing applications on streams. We exploit the Haskell stream operators to handle streams and the list comprehension for querying these streams. The SPARQL algebra is extended in [
13] with time windows and pattern matching for stream processing. In our approach we exploit the existing list comprehension and pattern matching in Haskell, aiming at the same goal of RDF streams processing. Comparing with C-SPARQL, Haskell provides capabilities to aggregate streams before performing queries against them. Etalis tool performs reasoning tasks over streaming events with respect to background knowledge encapsulated as Prolog rules [
14]. In our case, the background knowledge is obtained from ontologies, translated as strict rules in order to reason over a unified space.
The need to consider revision in stream processing is also addressed in [
15]. A rule-based algorithm is developed to handle different situations in which event revision should be activated. In our case, the defeasible semantics of plausible logic is enacted to block the derivation of complex events in case of new contradictory information. Consequently, the knowledge engineer would be responsible to define what a plausible event is, and what consequences cannot be retracted even if its premises have been proved false after revision, by modelling them as strict rules. One possible line of research, which harmonises these two complementary approaches, would be to develop a logic for stream processing.
The strength of plausibility of the consequents is given by the superiority relation among rules. One idea of computing the degree of plausibility is to exploit specific plausible reasoning patterns like epagoge: “If A is true, then B is true. B is true. Therefore, A becomes more plausible”, “If A is true, then B is true. A is false. Therefore, B becomes less plausible”, or “If A is true, then B becomes more plausible. B is true. Therefore, A becomes more plausible.”
The research conducted here can be integrated into the larger context of Semantic Sensor Web, where challenges like abstraction level, data fusion, application development [
16] are addressed by several research projects like Aspire [
17] or Sensei [
18]. By encapsulating domain knowledge as description logic programs, the level of abstraction can be adapted for the current application by importing a more refined ontology into DLP. From a different perspective, the ontology used for stream processing acts as a summator [
19]: instead of storing all the input data, the incoming items are classified and only this abstract knowledge is stored as instances. One advantage is that the system facilitates business intelligence through the set of semantic queries that can be addressed against the classified knowledge. Examples of such queries are “how many dairy products have been sold since yesterday” or “what quantity of man’s wear was sold in the last month”, where the level of refinement is given by the exploited ontology.
4. Data Stream Management System in Haskell
This section details the architecture of the PSRH (Plausible Stream Reasoning in Haskell) system (
Figure 2). For each problem, a domain expert is responsible to define the priorities and the plausible rules in order to handle contradictory data. The mapping module translates the available ontologies into facts and strict rules. The stream module provides a collection of functions for manipulating the input data streams. The relevant sensor-based measurements are stored as facts in the plausible theory. The decisive plausible tool is continuously queried in order to support plausible decisions in real-time. Each module described in the following paragraphs is built on top of the Haskell platform.
Figure 2.
The architecture of the PSRH (Plausible Stream Reasoning in Haskell).
Figure 2.
The architecture of the PSRH (Plausible Stream Reasoning in Haskell).
4.2. Streams Module
Table 2 illustrates the operators provided by Haskell to manipulate infinite streams. The basic operators allow to construct infinite streams or to extract elements from the input stream. The high order functions
map,
inter,
scan, and
transp allow to apply different transformations on the input streams.
Table 2.
Stream operators in Haskell (
![Futureinternet 04 00865 i110]()
stands for the
![Futureinternet 04 00865 i111]()
datatype).
For instance, the
map function converts each element within a stream, according to a given conversion function. The conversion takes place when the converted values are required by other function. Based on a given operation, the
scan function aggregates the elements within a stream into a single value. For computing at each step the sum of a stream of transactional data, the following expression can be used:
It provides as output the infinite stream
![Futureinternet 04 00865 i115]()
, where the current value sums all the previously ones.
The aggregation functions combine elements from two input streams in order to generate a continuous output stream. Consider one wants to add the corresponding values from two financial data streams
![Futureinternet 04 00865 i086]()
and
![Futureinternet 04 00865 i116]()
, expressed by two different currencies:
where the conversion function is applied on each element from
![Futureinternet 04 00865 i116]()
.
The aggregation of two streams takes place according to an aggregation policy, depending on the time or the configuration of the new tuples. The policy is a function provided as the first input argument for the high order function
zipWith, which has three input parameters and one output:
The elements of the input streams are combined according to the
policy. The
zipWith function continuously produces the output stream
outStream with the elements from the input streams
inStream1 and
inStream2 aggregated based on the given policy. Similarly, generating new streams can be done based on a policy
f, as in:
![Futureinternet 04 00865 i119]()
. An incoming stream can be dynamically split into two streams, based on a predicate
p.
4.3. The Mapping Module
Two sources of knowledge are exploited to reason on data collected by the sensors. On the one hand, one needs detailed information about sensors, measurements domain and units, or accuracy. On the other hand, domain specific axioms are exploited when reasoning on a specific scenario.
A partial view of the sensor ontology is formalised in DL in
Figure 3.
Figure 4 graphically illustrates the
TBox and
ABox of the ontology. The sensor
![Futureinternet 04 00865 i086]()
is an instance of the class
ActiveRDF and it measures temperature with an accuracy of 0.5
![Futureinternet 04 00865 i036]()
C. The current temperature is 6
![Futureinternet 04 00865 i036]()
C, and the measurement frequency is six observations per minute (
Figure 4). Noting that
Temperature is a
PhysicalQuality (axiom 9 in
Figure 3), there is a role
measure between the sensor
![Futureinternet 04 00865 i086]()
and the temperature value
![Futureinternet 04 00865 i120]()
, as axiom 1 defines. The corresponding RDF stream for the sensor
![Futureinternet 04 00865 i086]()
looks like:
where at each 10 seconds the measured value increases with one degree Celsius, from 4
![Futureinternet 04 00865 i036]()
C to 6
![Futureinternet 04 00865 i036]()
C.
The ontology is translated into strict rules based on the conceptual instrumentation introduced in
Section 3.3, in order to reason only within plausible logic. The resulted strict rules and facts appear in
Figure 5. Observe that the terminology is considered static, whilst assertions may vary in time in
Figure 5.
Rapid developments of the sensor technology raises the problem of continuously updating the sensor ontology. The system is able to handle this situation by treating the ontology as a stream of description logic axioms. When applying the high order function
![Futureinternet 04 00865 i122]()
on the transformation function
![Futureinternet 04 00865 i089]()
, each axiom in the description logic is converted to strict rules as soon as it appears:
outputs the infinite list:
The main advantage consists in the possibility to dynamically include new background knowledge in the system.
Figure 3.
Partial view of the sensor ontology.
Figure 3.
Partial view of the sensor ontology.
Figure 4.
Graphical view of the sensor ontology.
Figure 4.
Graphical view of the sensor ontology.
Figure 5.
Translating the sensor ontology.
Figure 5.
Translating the sensor ontology.
5. Running Scenario
The scenario regards supporting real-time supply chain decisions based on RFID streams. Consider the stock management of an online shop. RFID sensors are used to count the items entering the warehouse from two locations. The items leave the warehouse from exit points, corresponding to three output streams. Monitoring an item like
Milk implies monitoring several subcategories as
WholeMilk and
LowFatMilk. The retailer sells a specific item
![Futureinternet 04 00865 i126]()
of whole milk, and two types of low fat milk
![Futureinternet 04 00865 i127]()
and
![Futureinternet 04 00865 i128]()
. Some peak periods are associated to each commercialised item.
This background knowledge is formalised in
Figure 6. The corresponding strict rules are depicted in the upper part of the
Figure 7. During peak periods for an item, the usual supply action is blocked by the defeater
![Futureinternet 04 00865 i129]()
. The plausible rule
![Futureinternet 04 00865 i130]()
says that if the milk stock
![Futureinternet 04 00865 i093]()
is below the alert threshold
![Futureinternet 04 00865 i131]()
, the
NormalSupply action should be executed.
NormalSupply assures a stock value of
![Futureinternet 04 00865 i132]()
. Instead, the
PeakSupply action is derived by the rule
![Futureinternet 04 00865 i129]()
.
Figure 6.
Domain knowledge sample for milk monitoring.
Figure 6.
Domain knowledge sample for milk monitoring.
Figure 7.
Plausible knowledge base.
Figure 7.
Plausible knowledge base.
If there is an alternative item
Z for the
Milk product and the stock of the alternative is larger than the threshold
![Futureinternet 04 00865 i133]()
, this implies not to supply the higher quantity
![Futureinternet 04 00865 i132]()
(the rule
![Futureinternet 04 00865 i134]()
). Whether the action is executed or not depends on the priority relation between the rules
![Futureinternet 04 00865 i134]()
and
![Futureinternet 04 00865 i135]()
,
The sensor related information can be integrated when reasoning. If the sensor
S seems not to function according to the specifications in the ontology, it is plausible to be broken (the rule
![Futureinternet 04 00865 i136]()
). A broken sensor defeats the stock information asserted in the knowledge base related to the measured item (the defeater
![Futureinternet 04 00865 i137]()
).
The merchandise flow is simulated by generating infinite input and output streams. Assuming that the function
![Futureinternet 04 00865 i138]()
based on the list of available items returns a random item. The infinite output stream for the payment point
![Futureinternet 04 00865 i139]()
would be:
where
l is a list with the available items in the simulation. Consider the following RDF stream
![Futureinternet 04 00865 i086]()
of sold items
(item sold price) and the associated time of measurement, where the predicate
sold and the
price value are removed for clarity reasons:
The
updateStock function continuously computes the current stocks based on the
![Futureinternet 04 00865 i086]()
stream. Based on the fact
![Futureinternet 04 00865 i143]()
and the rule
![Futureinternet 04 00865 i038]()
, one can conclude that
![Futureinternet 04 00865 i126]()
is a milk item. Similarly, based on the facts
![Futureinternet 04 00865 i144]()
and
![Futureinternet 04 00865 i145]()
, the rule
![Futureinternet 04 00865 i039]()
categorises the instances
![Futureinternet 04 00865 i127]()
and
![Futureinternet 04 00865 i128]()
as milk items. The filter function is used to monitor each milk item, either low fat or not:
Here, the predicate
milk returns true if the input is of type
Milk according to the rules
![Futureinternet 04 00865 i038]()
or
![Futureinternet 04 00865 i039]()
. The
map function is used to select only the element
item from the tuples
![Futureinternet 04 00865 i147]()
from the stream
![Futureinternet 04 00865 i086]()
: the composition
![Futureinternet 04 00865 i148]()
is used to extract the first element in the first tuple.
The stream
milkItems collects all the items of type milk every time an item occurs. Based on
![Futureinternet 04 00865 i086]()
, the
milkItems is:
The
![Futureinternet 04 00865 i150]()
function is activated to compute the available stock for a specific category of products. Consider the current stock for milk is 102 and the threshold
![Futureinternet 04 00865 i131]()
for triggering the alarm is 100. Assume the function
updateStock is called with the first input parameter
milk and the second parameter
milkItems. At time 1,
![Futureinternet 04 00865 i127]()
being low fat milk, identified as a subtype of milk, the stock is updated at the value 101. At instance 3,
![Futureinternet 04 00865 i126]()
being fat milk, identified also as a subtype of milk, the stock is updated at the value 100. At time step 6,
![Futureinternet 04 00865 i128]()
, the stock value reaches
![Futureinternet 04 00865 i151]()
. At this moment, the predicate
![Futureinternet 04 00865 i152]()
becomes valid.
Consequently, the rule
![Futureinternet 04 00865 i130]()
in
Figure 7 is plausibly activated. The algorithm checks whether any defeater or stronger rule can block the derivation of the conclusion of the rule
![Futureinternet 04 00865 i130]()
. If no one blocks it, the action
normalSupply for
milk of value
![Futureinternet 04 00865 i132]()
is executed. If, for instance in case of a peak period, the defeater
![Futureinternet 04 00865 i129]()
is active, since
![Futureinternet 04 00865 i129]()
is stronger than the rule
![Futureinternet 04 00865 i130]()
, it successfully blocks the derivation of the action
normalSupply. Instead, the consequent of the rule
![Futureinternet 04 00865 i134]()
will be executed.
Thus, by combining ontological knowledge with plausible rules, one can reason with generic products (Milk), even if the streams report data regarding instances of specific products (WholeMilk and LowFatMilk). The benefit here consists in minimising the number of business rules that should be added within the system.
The current implementation was tested only on the above proof of concept scenario. The scenario involves a small number of axioms from the sensor ontology and from the milk-domain ontology, and also few plausible rules manually constructed to test the plausible reasoning mechanism. For the simulated ten streams of items with a time delay of 1 s for each item, the decision was taken in real time.
6. Conclusions
The proposed semantic based stream management system is characterised by: (i) continuous situation awareness and capability to handle theoretically infinite data streams due to the lazy evaluation mechanism; (ii) aggregating heterogeneous sensors based on the ontologies translated as strict rules; (iii) handling noise and contradictory information inherently in the context of many sensors, due to the plausible reasoning mechanism. The system represents a step towards building real-time stream processors for knowledge-rich applications.
With streams being approximate, omniscient rationality is not assumed when performing reasoning tasks on streams. Consequently, we argue that plausible reasoning for real time decision making is adequate. One particularity of our system consists of applying an efficient non-monotonic rule based system [
7] when reasoning on gradually occurring stream data. The inference is based on several algorithms, which is in line with the proof layers defined in the Semantic Web stack. Moreover, all the Haskell language is available to extend or adapt the existing code. The efficiency of data driven computation in functional reactive programming is supported by the lazy evaluation mechanism, which allows to use values before they can be known.
In order to apply our PSRH system to a different domain, three tasks are necessary:
to translate the domain specific ontologies into strict rules, which is automatically performed by the mapping module;
to design plausible rules and priorities by a domain expert;
to import the most adequate sensor ontology for the current problem (for instance SWEET or W3S SSN Ontology).
During the translation part, some knowledge from the ontology may be lost. The axioms stating a subclass of a complex class which is a disjunction cannot be translated into PL, as in
![Futureinternet 04 00865 i153]()
. Also, subclases of a complex class expression which is existential quantified cannot be translated, such as
![Futureinternet 04 00865 i154]()
. Our solution allows this limitation of expressivity in order to perform reasoning in real time within the efficient plausible logic framework.
The current implementation was tested only on the proof of concept scenario described in
Section 5. More extensive experiments are needed to test the large scale efficiency and scalability of the proposed system. At the moment, we are able to support our solution based on the results reported in [
7] and based on the reduced complexity of description logic programs [
8].