Next Article in Journal
Web-Scale Multidimensional Visualization of Big Spatial Data to Support Earth Sciences—A Case Study with Visualizing Climate Simulation Data
Next Article in Special Issue
In Search of Smartness: The EU e-Justice Challenge
Previous Article in Journal
Development and Evaluation of a Mobile Application Suite for Enhancing the Social Inclusion and Well-Being of Seniors
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Reinforcement Learning for Predictive Analytics in Smart Cities

Kostas Kolomvatsos
1,* and
Christos Anagnostopoulos
Department of Computer Science, University of Thessaly, Papasiopoulou 2-4, Lamia 35100, Greece;
School of Computing Science, University of Glasgow, 17 Lilybank Gardens, Glasgow G12 8QQ, UK
Author to whom correspondence should be addressed.
Informatics 2017, 4(3), 16;
Submission received: 30 March 2017 / Revised: 26 May 2017 / Accepted: 22 June 2017 / Published: 24 June 2017
(This article belongs to the Special Issue Smart Government in Smart Cities)


The digitization of our lives cause a shift in the data production as well as in the required data management. Numerous nodes are capable of producing huge volumes of data in our everyday activities. Sensors, personal smart devices as well as the Internet of Things (IoT) paradigm lead to a vast infrastructure that covers all the aspects of activities in modern societies. In the most of the cases, the critical issue for public authorities (usually, local, like municipalities) is the efficient management of data towards the support of novel services. The reason is that analytics provided on top of the collected data could help in the delivery of new applications that will facilitate citizens’ lives. However, the provision of analytics demands intelligent techniques for the underlying data management. The most known technique is the separation of huge volumes of data into a number of parts and their parallel management to limit the required time for the delivery of analytics. Afterwards, analytics requests in the form of queries could be realized and derive the necessary knowledge for supporting intelligent applications. In this paper, we define the concept of a Query Controller ( Q C ) that receives queries for analytics and assigns each of them to a processor placed in front of each data partition. We discuss an intelligent process for query assignments that adopts Machine Learning (ML). We adopt two learning schemes, i.e., Reinforcement Learning (RL) and clustering. We report on the comparison of the two schemes and elaborate on their combination. Our aim is to provide an efficient framework to support the decision making of the QC that should swiftly select the appropriate processor for each query. We provide mathematical formulations for the discussed problem and present simulation results. Through a comprehensive experimental evaluation, we reveal the advantages of the proposed models and describe the outcomes results while comparing them with a deterministic framework.

1. Introduction

1.1. Smart Governance and Smart Cities

Smart public governance [1] has been proposed to emphasize the application of information and communication technologies in the public sector and to enhance every technological component realized in Smart Cities (SCs) [2]. SCs should focus on how to reduce the perceived complexity and how citizens will be involved and participate in the SCs management process. Accordingly, public authorities should aim to simplify any interaction in a SC and offer automated services in real time. SCs become smart not only in terms of the way they automate routine functions serving individual persons, buildings, traffic systems but in ways that enable monitoring, understanding, analyzing and planning the city to improve the efficiency, equity and quality of life for its citizens in real time [3]. Such initiatives are based on the collection and the efficient management of the appropriate data to improve the effectiveness of public authorities interventions. Governments and public authorities should decide what data to retrieve, how and their management mechanisms.
In any case, cities should be based on a network of multiple systems all of them being closely connected with citizens’ needs. This perspective requires an integrated vision of a city and its infrastructures [3]. Local authorities should change the way of management of the available infrastructures and be based on automated solutions that will facilitate novel services supporting the so-called vision and leadership [3]. They should also adopt technologies that will secure that efforts in SCs are coordinated rather than isolated. Smart government, hence, has to cope with (a) complexity and (b) uncertainty, thus, has to (c) build competencies and (d) achieve resilience [1]. For many officials, SCs are essentially networks of sensors strewn across the city, connected to computers managing vast flows of data, optimizing urban flows like mobility, waste, crime and money [4]. However, in this aspect, citizens could also participate in the SCs’ infrastructure by adopting the provided services and offer their view in the city by collecting and offering ambient data. A system considering citizens as the part of the provided infrastructure will transform them from passive entities to active towards the facilitation of their lives. Citizens can play an important role not just in the design but also in the delivery of public services. This approach will alleviate the burden of local authorities and reduce the effects of the traditional bureaucratic model of governance as the provided services will be offered in an automatic, de-centralized way. The bureaucratic model is based on government monitoring, thus, citizens have less control over smart initiatives and have a more passive role [3]. In such scenarios, there is the need for building efficient, automated mechanisms that will manage the available data, services and the interaction between citizens and local authorities.
Taking into consideration the huge volumes of data collected by the provided infrastructure and the citizens themselves, it is imperative to propose technologies for managing these data that usually are collected in a ‘distributed’ manner. Data adopted to support innovations can come from a wide range of sources (citizens among them) and stored to different locations (according to the storage management adopted by SCs). Usually, such data are unstructured (they may be the result of services) imposing complexity in the management mechanisms. Efficient decision making on top of the collected data will facilitate any innovation and have significant impact, financial or not, in citizens. The adoption of the appropriate models will allow public authorities to formulate policies, apply strategies and engage citizens in a more efficient way. The envisioned decision making should be automated, provided in real time and cannot be done through the traditional bureaucratic model. The quality and the speed of the decisions play an important role towards the success of the envisioned models. The basic components of decision-making models are analytics services built on top of the collected data. Analytics will give insights on the data and be the basis for supporting high level services offered to the citizens and public authorities.

1.2. Motivation and Research Challenges

The research community has paid increased attention on the large scale data analytics due to their importance in many application domains (SCs among of them). Many cities have proven their intention to become SCs through the adoption of numerous digital devices into their infrastructure. The aim is to have the devices recording ambient information and, afterwards, to manage these data and support intelligent applications that will facilitate citizens’ lives. This data-driven aspect of the management of SCs digital infrastructure aims to make cities knowledgeable and controllable. The aforementioned numerous resources (e.g., sensors, citizens devices) generate huge streams of data. Data can be either processed on-line or stored for future management. In any case, efficient techniques should be applied to support intelligent applications as analytics will be the key for providing the necessary means for the creation of novel as well as efficient decision making mechanisms. Public authorities (e.g., municipalities) will be supported by these mechanisms to derive the optimal decisions related to the services offered to citizens. The most important problem in such scenarios is the efficient management of the huge volumes of the collected data to support decisions in SCs. Large scale data should be ‘integrated’ in the SCs architecture to provide a realistic framework that offers efficient services in citizens and public authorities. The provided services should be concluded in (near) real time on top of automated mechanisms for data collection. As studied (e.g., in [5]), the decision and control management is the most influential component for the realization of a SC. This motivates our work making us to propose a model for the efficient allocation of queries in multiple data partitions. We gain from the separation of data and the parallel execution of queries towards to save time and support real time applications.
As the volumes of the collected data are huge, one efficient approach is to separate them into a number of partitions to facilitate their management and the delivery of analytics. For instance, in a SC, a number of sinks (e.g., servers) could be present in different locations that could be responsible to collect the data originated in different geographical areas. In addition, for storage or performance purposes data could be separated or replicated into a number of servers. Data separation facilitates the parallel processing, thus, it saves time in the generation of responses to the defined queries. An information system should provide immediate and efficient responses to the incoming queries over the huge volumes of data. In this paper, we focus on the support of continuous queries defined by public authorities, citizens or applications. Research efforts in streams query processing  [6,7,8,9,10,11,12] focus on the ability of handling incoming data online against a set of continuous queries [13]. We envision that in front of each data partition, a Query Processor ( Q P ) is placed to execute queries over the specific part of data. We do not have any insights about how the data are distributed and we consider a SC dynamic environment where partitions’ contents change over time due to the underlying streams.
The idea of having multiple query processors is motivated by two requirements [14]: (a) when the query volume is too high for a single machine; (b) to ensure continuity of service in case of a query processor failure. Companies like Microsoft [14] or Oracle [15] have already supported the query execution on top of multiple data partitions revealing the need and the practicability of the approach. Hence, for supporting real time applications on top of huge volumes of data, we could rely in the idea of multiple query processors. However, when relying on multiple processors, we need a component that will be responsible for receiving queries and assigning them to a (sub)set of processors and, afterwards, it aggregates the results to return the final response to the application. In this paper, we focus on the former issue, i.e., the conclusion of queries assignments.
We define the concept of Query Controller ( Q C ), a module that receives queries and secure their efficient execution into the available Q P s. The efficient execution of the incoming queries is related to the best possible assignments i.e., the allocation of each query to a specific Q P where it will be executed. The main research challenge is to determine the most appropriate Q P to maximize the Quality of Result ( Q o R ). The Q o R is an indicator that a query response is retrieved in the minimum time and best answers the query. Many methods have been proposed for the Q o R evaluation [16,17,18]. The description of the relevant techniques for calculating Q o R is beyond the scope of this paper. Without loss of generality, we get Q o R [ 0 , 1 ] . When Q o R 1 means that the result is optimal according to the needs of the corresponding query and the opposite stands when Q o R 0 .

1.3. Contribution, Research Outcome & Organization

Each assignment, should be concluded by taking into consideration not only the response time and the Q o R but also the load of each Q P . We propose the use of Machine Learning (ML) and more specifically the adoption of Reinforcement Learning (RL) [19] and clustering. We focus on the known Q-Learning algorithm and the subtractive clustering. We provide an intelligent mechanism that learns the Q P that best ‘matches’ to each query based on historical data and a set of query families (every query belongs to a query family having specific characteristics). The subsequent step is to create a pool of learners and apply an ensemble learning process, e.g., bagging and boosting [20] to build on top of learners’ advantages and eliminate their disadvantages. Each learning scheme dictates a certain model that is built on top of a set of assumptions [21]. This means that the possible error for each learner comes from different sources according to the adopted assumptions and the defined parameters. The adopted schemes are extended to smoothly be applied in the SC domain and become the basis of the Q C decision making mechanism. In this paper, we extend the model described in [22]. In [22], we propose a similar approach, however, we adopt only the Q-learning algorithm to derive the envisioned assignments. The proposed scheme does not take into consideration the load balancing aspect in the Q C behaviour and does not involve it in the definition of the rewards. In the current effort, we provide two learning schemes (one supervised and an unsupervised) setting up the basis for defining an ensemble learning scheme. We also ‘inject’ into the learners behavior the adoption of multiple parameters (compared to [22]), among them, the load balancing aspect of the Q C ’s behavior. We aim to keep the load of each Q P at low levels, thus, to secure the immediate responses to the incoming queries. The current effort deals with a more complex behavior of the Q C (adoption of multiple parameters) to lead to more efficient assignments, thus, to more efficient analytics. The following list reports on the contribution of our current work:
  • we propose two learners responsible to deliver the assignment of queries to a set of Q P s. The first comes from reinforcement learning and the second from clustering;
  • we involve in the learning process a set of parameters related to the behaviour of Q P s (e.g., response time, load, Q o R ) to be fully aligned with their performance;
  • we propose a multiple Q-tables scheme as the knowledge base of the Q C (in the RL case) and a technique for deriving the compactness of the generated clusters (in the clustering case) adopted to deliver the best Q P for each assignment;
  • we build on top of an incremental clustering scheme for updating the available clusters;
  • we provide a comprehensive performance evaluation of the proposed learning schemes and a comparative assessment with a baseline solution;
  • we setup the basis for our next research step where we will rely on a pool of learners and deliver and intelligent scheme for their combination.
The paper is organized as follows. Section 2 discusses research efforts on big data analytics while Section 3 presents our scenario and gives an insight to the adopted architecture. In Section 4, we present our learning schemes and describe the decision process for each one. Model realization, performance metrics, simulation set-up and experimental evaluation are presented in Section 5. Finally, in Section 6, we conclude our paper by giving future extensions.

2. Related Work

In the research community, the efficient management of data streams is significant for many application domains e.g., financial services, life sciences, mobile services. Numerous devices and users create huge volumes of data like environmental data, tweets, social networking interactions or photos [23]. Large scale analytics involves the process of collecting, organizing and analyzing huge volumes of data. Research efforts aim to discover patterns, especially in the case of unstructured data, thus, to extract knowledge and support decision making mechanisms. New functionalities are offered on top of already proposed systems like Hadoop to increase the performance. For instance, the authors in [24] propose Starfish which is a self-tuning tool for large scale analytics. It can be adapted to user requirements and to the system workload in order to provide an efficient solution.
Retrieving a response near real-time could be very difficult due to limitations defined by the amount of data and the underlying hardware performance. Querying data samples and progressive analytics is an efficient solution for the described problem [25]. Samples should be defined taking into consideration the domain, the underlying infrastructure and so on. Specific sampling techniques have been already proposed [26,27,28]. In progressive analytics, the Approximate Query Processing (AQP) technique is the key means for handling the accuracy in early results. AQP aims to provide confidence intervals for early results [26,29,30]. Users are not involved in the process, however, based on the confidence intervals could develop an intelligent mechanism for handling the discussed information. When accuracy is at acceptable levels (according the specific application domain), the process could be stopped.
An analytics system is presented in [31]. The system is based on a framework called Prism and allows users to communicate samples to the system. Queries are processed over the defined samples. The authors proposed Now! a progressive data-parallel computation framework for Windows Azure, where progress is understood as a first-class citizen in the framework. Now! mainly works with streaming engines to support progressive SQL over big data. In [32], the authors present an on-line MapReduce scheme that supports on-line aggregation and continuous queries. For decreasing the latency of the system, the authors propose to have the Map task sending early results to the Reduce tasks. This mechanism enables the generation of approximate results, which is particularly useful for interactive analytics. In [33], the authors present a continuous MapReduce model. The execution of the Map and Reduce functions is coordinated by a data stream processing platform. Latency is improved through a model where mappers are continually fed by data instead of files and the retrieved results are transferred to reducers. CONTROL [27] is an AQP system adopted to support progressive analytics. Users have the opportunity to refine answers and have online control of processing. Hence, users are actively involved in the data analysis process. DBO [34] is another AQP system able to calculate the exact answer to queries over a large relational database in a scalable fashion. DBO can have an insight on the final response together with specific bounds for the accuracy of early results. As more information is processed, the DBO has the opportunity to provide more accurate results. Users can stop the process at any time, if the accuracy level is of their preference.
Large scale data play an important role in decision making models adopted by governments and enterprises. The growth of volume of real-time data requires new techniques and technologies to discover the insight value [35]. Large scale data requires real-time data-intensive processing that runs on high-performance clusters [36]. Apart from ‘static’ data, researchers focus also on the management of large scale data streams [37]. A technique, widely adopted for extracting knowledge from data streams, is clustering. A special case is represented by so-called graph-shaped data (big) streams, which are produced by graph sources providing both structure-and content-oriented knowledge [37]. The parallel management of multiple data partitions gains interest as it offers many advantages. The parallel processing saves time and increases efficiency. In general, the distributed data processing and the aggregation of the results in a centralized authority will increase efficiency especially in the integration of massive multivariate data [38]. The role of large scale data and analytics in SCs has already been revealed in various research efforts [39,40,41,42]. Some of the potential uses of analytics are the Smart Grid, waste management, traffic management, environmental monitoring, security, planning, and so on. The authors in [40] discuss and compare different definitions of the smart city and big data and explores the opportunities, challenges and benefits of incorporating big data applications for smart cities. In addition they attempts to identify the requirements that support the implementation of big data applications for smart city services. The vision of a SC is to improve livability, preservation, revitalization, and attainability of a community [42]. In [40], the authors identify the benefits of the adoption and management of analytics in SCs: (i) efficient resource utilization; (ii) better quality of life; (iii) higher levels of transparency and openness. In general, the services built on top of the provided analytics facilitate the easiest management and control of the different smart city aspects and applications. The processing becomes automated setting up the basis for defining intelligent applications that will facilitate citizens’ lives. Many sectors of the society can gain from large scale data and their management. This information needs to be properly stored and be readily available [43]. Two of the proposed models that adopt large scale data processing in SCs are as follows. In [44], an approach for enabling the ‘easier’ composition of real-time data processing pipelines in SCs is presented, exploiting a block-based design approach, similar to the one adopted in the Scratch programming language for elementary school students. In [5], the authors propose a framework that operates on three levels: data generation and acquisition level collecting heterogeneous data related to SC operations, data management and processing level that filters, analyzes, and stores data to make decisions and events autonomously, and application level initiating execution of the events corresponding to the received decisions.
The concept of an entity executing continuous queries over large scale data has been already explored. In [45] the goal is to recur queries, repeatedly being executed for long periods of time over evolving high-volume data. The proposed Redoop system consists of an extension of Hadoop that pushes the support and optimization of recurring queries into Hadoop’s core functionality. Other systems are Haloop [46], Twister [47] and Nova [48]. Haloop is a modified version of the Hadoop MapReduce framework designed to serve applications. It extends MapReduce with programming support and improves efficiency by making the task scheduler loop-aware and by adding various caching mechanisms. Twister is a programming model that enhances MapReduce and supports iterative query computations. Nova deals with data in large batches using disk-based processing. It fits for a large fraction of data processing use-cases on top of continually-arriving data. Finally, ML is widely used for supporting ‘big data’ tools. The interested reader could refer in [49,50], for a survey on the ML techniques adopted in large scale analytics.
In this paper, we focus on the first step in a Q C decision process i.e., the assignment of each query into a (sub-)set of the available Q P s. We propose two ML schemes for assigning queries under the rationale that a Q P is not the appropriate processor for any arbitrary query. The Q C can have access on historical data related to the performance of the Q P s. The response time, the Q o R or the load of each Q P could be some of the parameters that could affect the assignment process. The aim is to gain from Q P s exhibiting high performance and avoid congestions to efficiently support large scale analytics. We propose the adoption of two learning schemes based on ML that derive the QP to which each query should be assigned. We compare their performance and present their advantages and disadvantages. The two schemes could be easily combined in order to support an ensemble learning scheme e.g., a weighted scheme. In the first place of our research agenda is the definition of an efficient combination scheme. The combination scheme will build on top of a pool of different types of learners and adopt a scheme that smoothly aggregates their advantages. The difference of our work compared to other efforts in the field is that we do not focus on the management of the collected data but in the selection of the entities that will manage the data. We do not deal with data generation and acquisition as well as with data filtering and analyzing. These issues are managed by the Q P s that are selected by the Q C . Hence, our model can be considered as complementary to the efforts present in the domain of data management in SCs.

3. Rationale and Preliminary

In Figure 1, we depict the envisioned architecture. Data retrieved in a SC arrive in high rates through a number of streams. We consider that data are stored (either partially or the entire set) and processed at a later stage. A Real Time Processing Engine ( R T P E ) is the module that undertakes the responsibility of managing the incoming data. The description of the R T P E is beyond the scope of this paper. As the data are collected, they are separated or replicated into a number of partitions. Data parallelism can offer advantages in the ‘efficient’ query execution by splitting the data in to the partitions. We consider that no specific algorithm is adopted for splitting the data and, thus, we cannot be aware on the contents of each partition. Without loss of generality, we assume that n partitions are available. In a large scale data setting, n does not affect the volumes of data stored in partitions.
A set of Q P s Π = π 1 , π 2 , , π n are available for executing queries. For every partition, a Q P ( π i , i = 1 , 2 , , n ) is responsible to manage the underlying data and return responses to the Q C . The  Q C is in the middle between the Q P s and the public authorities/users/applications. We adopt the term ‘users’, in the remaining, to denote public authorities, users or applications issuing queries to the Q C . Queries Q = q 1 , q 2 , belong to specific families. A query family is a set of queries having common characteristics and needs in terms of execution. There are | F | families, F = f 1 , f 2 f F having specific characteristics. For each family, a specific execution workflow can be assigned maximizing the Q o R and minimizing the required execution time. Every time a query q i arrives, the Q C should decide the best Q P π j where the q i should be assigned to. An assignment A ( q i , π j ) is the allocation of the query q i to the Q P π j . Each q i has specific characteristics defined by the family f k where the q i belongs to. These characteristics are represented by C = c 1 , c 2 , , c | C | ) and affect the execution of the q i . According to Q P s characteristics, every q i f k should be assigned to the ‘best’ π j . The term ‘best’ refers to the Q P that will efficiently execute the q i (in the minimum time and with the best performance). The Q o R is the reward that the Q C gains from the assignment  A ( q i , π j ) .
Recall that exact answers to queries may not be feasible. Hence, the AQP could result the Q o R for each partial response during the execution of a query. Let C q i represent the characteristics of q i while C π j represent the characteristics of π j . For instance, some of the characteristics of a Q P could be the load, the response time, the consumption of resources, the adopted optimization techniques and so on. At this time, the final selection is affected mainly by C π j that is dynamically updated according to e.g., the current load of the π j or the amount of data in the jth partition. C π j affects the performance of the π j that is induced by the q i ’s family as well. For instance, π j could not be the appropriate Q P for executing a specific query family. We apply two learning schemes that are based on historical performance values realized by the combination of C π j and C q i . Such historical values are processed by the Q C in order to be always capable of deriving the best assignment. We adopt two ML algorithms: the Q-learning [19] and the subtractive clustering algorithms [51]. Concerning the Q-learning algorithm, the learning process generates a set of tables (i.e., the Q-Tables) one for each f k , thus, the allocation is aligned with each family’s characteristics. Concerning the clustering algorithm, our mechanism results clusters based on c i C π j , i . Based on these two schemes, the Q C derives the Q P proposed by each scheme and this Q P will take part in the final assignment A ( q i , π j ) .

4. The Query Assignment Learning Schemes

4.1. Reinforcement Learning

The RL [19] deals with the behaviour of an agent (i.e., the Q C ) that tries to take some actions in an environment E while being at some state (defined later) s S ( S = s 1 , s 2 , , s | S | is the set of states). Through actions A = a 1 , a 2 , , a | A | the agent tries to maximize the long-term reward. Hence, it tries to find the appropriate policy, that maps states to actions. Q-learning is an RL variant. The main advantage of this algorithm is its capability to define the expected reward. The algorithm adopts tables to store information related to the rewards that the agent will gain following a policy. The  Q C adopts Q-learning to learn how to assign every q i to Π . In RL, the focus is on online performance, which involves balancing between exploration and exploitation. The basic RL problem as applied to Markov Decision Processes (MDPs) consists of: (a) a set of environment states S , (b) a set of actions A and (c) a set of scalar rewards defined in R . In our case: (i) S is depicted by the available Q P s, thus, | S | = n ; (ii)  A is depicted by the selection of a specific Q P ; (iii) rewards are depicted by the set R .
We consider a discrete time domain where each time instance t refers to the step of the RL algorithm where the Q C takes a decision and transits from one state to the other i.e., the assignment of the specific query to a specific Q P . At t, the Q C perceives at being in state (assignment) s t and the set of possible actions (assignments to other Q P s) A s t . It chooses an action a A s t and receives from the environment a new state s t + 1 and a reward r t R . It should be noted that the discussed scenario involves a very dynamic environment as the Q C should interact with the available Q P s that serve a set of Q C s and, thus, their characteristics are continuously updated. Based on these interactions, the ‘learning enhanced’ Q C should develop a policy p : S A which maximizes the realized final reward retrieved by the assignment A q i , π j : R = r 1 + r 2 + + r N where N is the number of parameters considered to define the final reward. It should be noted that rewards r 1 , r 2 , , r N are normalized in [0,1]. In addition, A q i , π j is concluded after the classification of q i to a specific family. For instance, such parameters could include the time required by π j to execute a query (the lower the time, the higher the reward), the final Q o R retrieved by the assignment A q i , π j and so on. For MDPs without terminal states, the following equation holds true: R = t γ t r t , where γ ( 0 , 1 ] is the future reward discount factor. The Q C acts towards a specific goal which involves the selection of the appropriate Q P . The appropriate Q P has specific characteristics involving:
  • the time required for deriving the final result (response time - R T ) as realized by historical values. We can adopt a simple (e.g., average performance values) or a more complex process over these values. It should be noted that R T could be depicted by the physical time e.g., in milliseconds;
  • the Q o R retrieved by the A q i , π j also based on historical values. This parameter is affected by the C q i and the C π j . We can adopt any desired algorithm for the realization of the Q o R ;
  • the minimum number of steps (in the discrete time) required for the selection of the appropriate Q P . As we focus on a query streaming scenario, the Q C serves a huge number of queries and, thus, the assignment process should be concluded in the minimum time;
  • the lowest load L in order to minimize the load of the selected Q P s.
The decision process of the Q C consists of the following parts: (i) The training phase. Training is performed over historical values of performance and aims to create the Q-tables; (ii) The assignment phase. The assignment process is based on the Q-tables derived by the training process. In the beginning, the Q C classifies the q i to a family f k and, accordingly, ‘checks’ the corresponding Q-table. Based on the values of the corresponding Q-table, the Q C derives the final assignment.
The advantages of the Q-learning algorithm are: (i) it is a model-free algorithm; (ii) it does not assume knowledge of the environment; (iii) it balances exploration and exploitation. In general, the algorithm is appropriate for ‘static’ environments. In our scenario, we adopt the algorithm to immediately result the assignment for each query. The decision mechanism, when Q-tables are ready (i.e., trained), devotes limited number of steps to derive the assignment (refer to [52]), thus, it reduces the required time. In addition, we do not want to design a model from the beginning as the environment is very dynamic and limited efforts deal with modeling queries and processors. However, we can execute the training process of the Q-tables at pre-defined intervals based on the collected historical values to keep the quality of the decisions at high levels. For instance, we could adopt weekly data for training the available tables. Finally, as described in [52], we can reduce the time required for training, thus, to update the Q-tables more frequently.

4.2. The Training Phase

In the training process, the Q C should build a table for each f k . The reason is that query families have different requirements concerning the characteristics of the Q P s. Figure 2 presents the discussed setting. Actually, the Q C maintains | F | Q-tables trained at pre-defined intervals. For instance, the Q-tables creation could be realized every week when weekly performance data are retrieved for each Q P and for each query family. The creation of the tables is realized off-line.
Let us focus on a single Q-table. Each Q-table has two dimensions. Rows represent the states of the world ( S = Π ) and the n + 1 columns represent actions ( A π j ) that the Q C could take being at every state (i.e., a query family/type - assignment). This means that when the Q C examines the possibility of assigning the query to a Q P , it decides what action will take according to the values of the specific row of the Q-table. This decision could be the assignment A q i , π j (action n + 1 ) or the transition to another Q P that corresponds to the specific action and consists of the best choice being at the current state.
The number of Q-tables elements is given by the following equation: E = | F | n n + 1 The information that the Q C takes into consideration to build the discussed Q-tables is:
  • the response time ( R T ). We actually get the result of a function g R T i , i = 1 , 2 , applied in the R T historical values;
  • the Q o R as defined by function h Q o R i , i = 1 , 2 , applied on the Q o R historical values. Q o R historical values are recorded by the Q C after concluding past transactions with each Q P ;
  • the number of transitions (T). T shows how many transitions the Q C needs to conclude a decision (selection of a Q P );
  • the predicted load (L). L is defined as the result of a predictive scheme that indicates the future load for each Q P .
For each parameter, we apply a technique to derive its final value. For instance, the smaller the T is, the greater the reward becomes. This is because the Q C wants to conclude the assignment at the lowest number of steps. In addition, if g R T i , h Q o R i and L 0 , the Q C tastes high reward as well.
Each Q-table is created adopting the following equation:
Q s t , a t Q s t , a t + l r + γ max a A Q s , a Q s t , a t
where l is the learning rate, r is the reward, γ is the future reward discount factor, Q ( ) gives values taken from the Q-table ( s and a are the state - action combinations already present in the Q-table), and s t and a t are the state and the action at t, respectively. l indicates when the learning process has reached to the final point (i.e., an acceptable level of learning). l is calculated based on the number of episodes (episodes define the training time) and decreases over time. The reward r is based on four (4) partial rewards: (a) the reward for the R T , (b) the reward for the Q o R , (c) the reward for T, and, (d) the reward for L. It should be noted that the greater the g ( ) and the h ( ) results are, the greater the r becomes. We also get a high r for low T and L. A specific episodes number is adopted in the training phase. The training phase aims to the productive creation of the Q-tables. We need a high enough episodes number to build ‘efficient’ Q-tables that perfectly depict the environment of the the agent. The episodes number could be constant, as indicated in the respective literature.
Paying attention to load L (i.e., number of queries executed by a specific Q P ), we adopt a simple technique to derive the future load estimation for each Q P . In the training process, we pay attention on the load of the Q P that represents the current state of the environment compared to the load of the Q P that depicts the next action. For each Q P , we apply a set of predictors (see Section 4.4) and get the future estimation of the load. Predictors are applied on the historical load values as realized through past transactions. From the retrieved results, we eliminate the maximum and the minimum values in order to avoid optimistic or pessimistic estimations. The remaining values are fed to a function z ( ) which derives the final future load estimation. The simplest form of z ( ) is the mean function. Accordingly, we assign high reward when the Q C takes an action that leads to the Q P with the minimum future load instead of rewarding the action that corresponds to the Q P with the highest future load. Through a high number of predictors, we aim to secure the incorporation of the heterogeneity of multiple predictors instead of relying on a single method. In addition, by assigning a high reward to the action that leads to a Q P with low load, we, actually, incorporate a ‘load balancing’ mechanism into the training process.

4.3. The Assignment Process

In pre-defined intervals, the Q C executes the RL training process and, thus, is able to have an up-to-date knowledge on the performance of each Q P . The important issue is that the Q C adopting the Q-learning algorithm is able to incorporate the knowledge about the performance of all the Q P s as experienced for each query family. Hence, the Q C is able to decide the best solution at every incoming query and to be adapted, immediately, to changes in queries characteristics. If the training process is executed in minimum intervals, the Q C will have a more efficient decision making mechanism. However, in such cases, there is an increased cost for training. After creating the Q-tables, the Q C is able to conclude assignments. At first, the Q C classifies the query to a family f k and selects the corresponding kth Q-table. The Q C randomly selects a Q P π j and tries to conclude the assignment. π j represents the initial state of the Q C . If the initial state is the jth entity, the buyer looks at the jth row of the Q-table. Based on the values retrieved by the jth row, the Q C is able to choose the maximum value and, thus, to select the action that should follow. This action could be the final assignment of the query to the π j or the transition to another Q P (i.e., row). The column of the maximum reward represents the action taken in the upcoming step. If the action is represented by the n + 1 column means that the Q C should assign the q i to the π j (current Q P ). If the best action is e.g., the yth column means that the Q C should take the yth action and ‘check’ the yth row. Every transition to the next Q P consists of the best choice at the current state. The process is repeated till the Q C selects the ‘best’ assignment (action n + 1 ) or no available solution is present. If the best indicated action is to return to a previous visited Q P , the best assignment is not feasible. In such cases, the Q C allocates the q i to a randomly selected Q P that depicts a fair behaviour for the specific query family (e.g., exhibiting performance over a threshold). The following algorithm presents the aforementioned steps.
As far as the second learning scheme concerns, the selection process is simple. The proposed model selects the Q P exhibiting the highest score. If multiple Q P s exhibit the highest score, more complex techniques could be applied. For instance, the system could result the top-k Q P s according to the defined score. A probabilistic analysis could be also adopted to derive the best Q P selected for the specific query. However, we leave the discussed analysis and the definition of a more complex mechanism as future work.
Algorithm 1 The assignment process
  • while true do
  •  getQuery( q i );
  • f = classify( q i );
  • a = selectTable(f);
  • j = getRandomRow(a); //random selection of a row
  •  end = false;
  • while !end do
  •    a c t i o n = getColumnWithMaximumValue(j);
  •   if a c t i o n = = n + 1 then
  •    end = true;
  •    concludeAssignment( A q i , π j ;
  •   else
  •    j = getRow( a c t i o n );
  •    if isAlreadyAccessed(j) then
  •     end = true;
  •     j = getAppropriateRow(a); //we seek for a Q P that has performance over the pre-defined thresholds
  •     concludeAssignment( A q i , π j ;
  •    end if
  •   end if
  • end while
  • end while

4.4. The Predictive Phase

The proposed model adopts a predictive scheme to incorporate the load historical data for each Q P into the training phase. To address this problem, we adopt multiple load predictors) to predict the future load. Different predictors could be used for different Q P s to attain adequate heterogeneity in their predictions. The use of different predictors by different Q P s will give us the ‘different’ thinking in our scenario. For each Q P , we, randomly, choose a set of predictors from the pool P which depicts the entire set of the available predictors. We consider | P | = 17 predictors where P = p r 1 , p r 2 , , p r | P | . In Table 1, we see the entire list of predictors. The provided list is not exhaustive meaning that, in many cases, we adopt the same predictor with different parameter values. For instance, in P , we adopt the cycle predictor with window size equal to 2, 4, 5, and 7. Every p r i takes into consideration the historical data H for each Q P and results the future estimation. Hence, p r i is a mapping p r i : H π j R , where H π j represents historical data for π j . The final result is the mean of the selected predictors.

4.5. The Clustering Scheme

The RL scheme can exhibit high performance when data are ‘static’. A very dynamic environment will result in huge amount of data and queries leading the RL scheme to assign queries of the same family to the same Q P s. This requires the update of the Q-tables. However, this could be costly when the number of families and Q P s is high. The second proposed scheme involves the adoption of clustering techniques and does not require any training process (unsupervised learning), thus, it can be executed more frequently. We apply multi-dimensional clustering on top of historical data related to the performance of the Q P s (e.g., response time, Q o R , load) to derive the possible clusters and their members. Accordingly, as we cannot be sure about the exact performance of each Q P , we focus on the cluster centres and aggregated them based on the compactness of each cluster and its cardinality. The compactness of a cluster is the average distance between members and the (virtual) centroid of the cluster. Hence, a cluster that exhibits high compactness and a high number of members will affect more the final result which depicts the final performance of a Q P over the parameters. The adopted parameters are already described in Section 4.2. From the available clustering techniques, we adopt the subtractive clustering [51] as it does not require the knowledge of the number of clusters in advance. It should be noted that the proposed clustering scheme is adopted for each query family. The performance of a Q P is depicted by various parameters (as defined in the reinforcement learning model); some of them are based on the family type while others are not. For instance, the load of a processor is not dependent on the query family while the response time could be dependent on the query family. Due to the dynamic nature of the proposed scenario (i.e., the load, the Q o R , the response time, etc. change over time), queries belonging to the same families could not, probably, be assigned to the same processors.
In subtractive clustering, every data point is consisted by multiple dimensions and a potential degree P i is defined according to its location to all other points. This potential depends on the Euclidean distance between the examined point and all other points. The point with the highest potential becomes the first cluster centroid and all the potentials for the other points are recalculated. The point with the highest potential becomes the next cluster centroid. The distance of the new candidate cluster center with all the previously defined cluster centroids should fulfill a specific distance condition defined by the algorithm and ensures that cluster centroids will have a minimum distance between them. If this condition is true then the point becomes the next cluster center or else it is rejected and its potential is set to 0. The potential degree for each point is calculated by:
P j = i = 1 M e a x i x j 2
where a = ψ r a and x is the data point, M is the number of points, ψ is a constant (usually set to 4) and r a is the cluster radius. Subtractive clustering is initially applied (e.g., when the Q C starts performing decisions) over the available historical data and derives a set of clusters. For deriving the appropriate Q P for each query, we calculate the score for each Q P based on the provided clusters. For this, we perform an aggregation process over the clusters in order to derive the final values of each parameter under consideration (e.g., response time, Q o R , load). The result of the aggregation process is fed to a model to get the final values and, thus, the score of each Q P . The Q P with the highest score is selected to conclude the assignment.
We aggregate the defined clusters based on data fusion and consensus techniques. We focus on the centroids weighted by the compactness and the cardinality of each cluster. Our mechanism, through data fusion, reaches to a consensus based on the linear opinion pool [53] over the cluster centroids. The linear opinion pool is a standard approach adopted to combine experts’ opinion (where in our case an expert refers to a cluster/centroid) through a weighted linear average of these opinions, i.e., the measurements. It is applied when there is a high need for combining conflicting experts’ opinion. The linear opinion pool, actually, provides a belief aggregation methodology. Our aim is to combine single experts’ opinions (i.e., cluster centroids) in order to produce the opinion of the set of clusters. We define specific weights for each cluster to ‘pay more attention’ on its ‘opinion’ and, thus, to affect more the final aggregated result. Since multiple clusters are available, belief aggregation methods are used to derive the final result. Formally, F ( x 1 , x 2 , ) is the aggregation operator, which, in our case, is the weighted linear average of cluster centroids, y = F ( x 1 , x 2 , ) = w 1 x 1 + w 2 x 2 + where w i is the weight associated with the centroid of cluster C i such that w i [ 0 , 1 ] and w i = 1 . Weights w i are calculated based on specific characteristics of each cluster. Concerning the compactness of a cluster, the following equation holds true:
x k = 1 | C i | u C i p u c i
where u is a point belonging in the cluster C i and p u depicts the coordinates of that point. The lower the compactness x k is the more compact the cluster becomes. We also consider the number of points (i.e., the cardinality | C i | ) of each cluster. The higher the | C i | is the better for the cluster as it involves many points, possibly, close to the centroid (especially when the compactness is high - x k is low). Based on x k and | C i | , we define the validity of each cluster. The validity of a cluster indicates the ‘importance’ of the cluster for the aggregation process based on the compactness and the cardinality. For instance, a cluster including a high number of points and ‘high’ compactness (low x k ) should affect more the aggregated result for a Q P . The validity is defined as follows:
v c i = 1 2 α + β
α = 1 x k τ : x k τ 0 : x k > τ
β = | C i | λ : | C i | < λ 1 : | C i | λ
and τ , λ are the thresholds over which the compactness and the cardinality of a cluster is considered as negligible. The proposed mechanism pays equal attention on both parameters (i.e., the compactness and the cardinality) to derive the final validity of each cluster.
For each Q P , based on the set of the available validity (i.e., one for each cluster), we derive the final score depicting the performance of the Q P in serving queries belonging to a specific family. At first, we calculate the final values for every parameter taken into consideration (e.g., R T , Q o R , L) based on the derived weights and the cluster centroids. The aim is to derive a final ‘point’ for each Q P that is consisted of specific values for each parameter. Hence, the final Q P ’s ‘point’ is defined by:
p k = i = 1 | C | w i · u i
with w i = v C i i = 1 | C | v C and u i is the centroid of each cluster. p k is a unique value derived by incorporating all the clusters centers (recall that we apply multidimensional clustering on top of multiple parameters) being implicitly affected by the query family. From Equation (5), we see that a cluster having high validity affects more the final value of a Q P performance. It should be noted that we consider that all the discussed parameters are in the interval [0,1]. Accordingly, we define the final score for each Q P based on the aforementioned results. The final score is calculated by comparing the parameters with a set of thresholds. We define P T as the threshold for each parameter (i.e., R T , Q o R , L). When the result derived by the above described process for each parameter is above/below the three (3) pre-defined thresholds, the score is increased by a constant value (e.g., 100), otherwise, it remains at the previous levels.

4.6. The Incremental Clusters Update Process

The clustering process is realized for every Q P and for every query family resulting a set of clusters. In order to keep up-to-date the available clusters, we adopt the second step of the Doubling algorithm [54]. When new historical data are recorded for the examined parameters, our algorithm checks if the new ‘point’ could be incorporated in the available clusters. Our mechanism calculates the minimum distance between the new ‘point’ and the set of the available clusters. If the discussed distance satisfies specific pre-defined conditions, the new historical data are included in an already present cluster. Otherwise, a new cluster is created. The Algorithm 1 presents the pseudo code of the proposed incremental process.
Algorithm 2 The incremental clustering update
  • while true do
  • if new historical data are present then
  •   point ← getNewPoint();
  •   for cluster ∈ Clusters do
  •    if distance(cluster, point) < min then
  •     min ← distance(cluster, point);
  •     c ← cluster;
  •    end if
  •   end for
  •   if min < threshold then
  •    c.add(point);
  •   else
  •    c ← new Cluster();
  •    c.add(point);
  •    Clusters.add(c);
  •   end if
  • end if
  • end while

5. Experimental Evaluation

We elaborate on the performance of the proposed learning schemes, the Reinforcement Learning Scheme ( R L S ) and the Clustering Scheme ( C S ). We adopt a set of performance metrics and simulate a query streaming environment. The Q C does not know anything about the execution process followed by every Q P however, it maintains historical values related to the response time ( R T ), the Q o R for each query family and the load (L) of the Q P s.

5.1. Performance Metrics & Simulation Set-up

We report on the performance of the proposed scheme concerning two axes: (i) the training phase and (ii) the assignment process. We define T T R that represents the required time to derive the final Q-tables or the initial clusters. The following equation holds true: T T R = k = 1 | F | T F where T F depicts the time required for training the Q-table corresponding to each query family or the time required to provide the initial set of clusters. As T T R 0 , the scheme exhibits the best performance and the opposite stands when T T R .
Concerning the assignment process, we focus on metrics related to the throughput, the minimum R T , the maximum Q o R and the minimum L. We consider the throughput as the amount of the successful queries executed in a specific time interval. We define G , that depicts the throughput of the Q C . G is defined as follows: G = | Q | k = 1 | Q | T Q where Q is the queries executed by the Q C and T Q is the assignment time for each q i i.e., the time required to conclude A q i , π j . G depicts the amount of queries executed by the Q C in a time unit. The greater the G is, the greater the performance of the proposed models becomes.
ω and the ϵ metrics aim to evaluate the performance of the proposed schemes concerning the R T . The following equations hold true: ω = 1 | Q | k = 1 | Q | R T k , ϵ = 1 | Q | k = 1 | Q | D k with
D k = 0 if R T k = min R T i , i = 1 , 2 , , n R T k min R T i otherwise
and R T k being the response time of a specific Q P . The ω metric depicts the average response time as realized by the assignments. It stands that R T [ 0 , 1 ] ( R T represents the normalized response time for each Q P derived by historical values) and, thus, when ω 0 , the proposed scheme exhibits the best performance (the worst performance is realized when ω 1 ). The ϵ metric reveals if the proposed scheme assigns each query to the Q P , among all, having the best performance related to the response time. As ϵ 0 , the scheme exhibits the best performance while the case of ϵ depicts the worst performance. It should be noted that, in our experiments, the R T is represented in seconds.
The Q o R is evaluated by the ϕ metric which is defined by the following equation: ϕ = k = 1 | Q | Q o R m | Q | . Q o R m depicts the maximum Q o R value that corresponds to a specific Q P where q i will be assigned to. The higher the ϕ is, the higher Q o R the Q C tastes. We also define the maximum Q o R coverage metric δ which depicts the coverage of the maximum possible Q o R in the set of the available Q P s. The higher the δ is, the better performance the model exhibits. When δ 1.0 ( Q o R [ 0 , 1 ] ), the model concludes an assignment which is the best among the entire set of the available Q P s. The following equation holds true: δ = 1 | Q | k = 1 | Q | Q o R k max Q o R i , i = 1 , 2 , , n
We define the θ metric depicting the required steps to derive the final assignment A q i , π j . In our experiments, we present the average θ value for the entire set of simulations. The lower the θ is ( θ 1 ), the better the performance becomes. The following equation holds true: θ = 1 | Q | k = 1 | Q | S T k with S T k depicting the steps required for delivering each assignment.
We also propose the adoption of the λ metric that depicts the load of the selected Q P s. The following equation holds true: λ = 1 | Q | k = 1 | Q | Λ k with Λ k depicting the load of the finally selected Q P . The lower the λ is, the better for the mechanism. The reason is that in such cases, the mechanisms selects Q P s exhibiting low load which, virtually, acts as a load balancing model in the assignment process.
Finally, we define the ζ metric as follows: ζ = i = 1 4 w i ξ i where ξ i ω , ϕ , θ , λ and w i represents the weights for each metric. The ζ metric aims to reveal the performance of the model taking into consideration the entire set of the examined metrics. The reason is that the Q C needs a mechanism that derives the final result in the minimum time by selecting the best possible Q P for each query. Hence, the Q C should pay attention on the entire set of metrics at the same time and not on just one of them. The lower the ζ is, the better the performance becomes.
We compare the proposed learning schemes, i.e., the R L S and the C S with a deterministic model D M . The D M examines the entire set of the available Q P s and, accordingly, selects the one having the maximum performance. The D M is the theoretical limit for our models, thus, we consider three cases: (i) the D M examines the entire set of the available Q P s and assigns the query to that having the minimum R T ; (ii) the D M examines the entire set of the available Q P s and assigns the query to that having the maximum Q o R ; (iii) the D M examines the entire set of the available Q P s and assigns the query to that having the minimum L. The D M exhibits the worst performance concerning the G metric and, actually, is the lower limit of the performance for any model that aims to define assignments between queries and Q P s. On the other hand, the D M exhibits the best performance related to the ω , ϵ , δ and λ metrics. For these metrics, the D M consists of the upper limit of the performance for any model aiming to define assignments between queries and Q P s.
We run a set of simulations for the discussed scenario. Simulations run in an Intel Pentium 3.2 Ghz processor with 4 GB RAM running Windows 7. Our simulator is written in Java and involves classes for queries, processors and clusters. We simulate the reception of multiple queries. When a query arrives, we select its family and accordingly we adopt the defined Q-tables to derive the appropriate processors. The family of each query is randomly selected when the processing of a query starts (in this set of experiments we do not focus on detailed characteristics of each query family). For creating the tables, we focus on three parameters, i.e., the response time, the Q o R and the current load. For each processor, we consider random values for each of the parameters based on the Uniform distribution when we produce historical values. We evaluate our model for diverse realizations of n and | F | . For each experiment, we consider the execution of 1000 queries and take results for the R L S , the C S and the D M . We consider n 2 , 5 , 20 , 50 , 100 , 500 , and | F | 5 , 20 , 50 , 100 , 500 . In our experiments, we get γ = 0.8 and α = 0.8 . In addition, we create a list of random historical values related to the R T , the Q o R and L in the beginning of each experiment. The initial list’s length is 100 while during experiments, we add historical values to a random Q P belonging to a random family. Finally, g ( ) , h ( ) and z ( ) are the average function, the episodes number is 1000 (for the R L S ) and for statistical processing, we consider the R T , Q o R and L as the last (current) value as depicted in the list of historical values (i.e., calculating the values for each metric).

5.2. Performance Assessment

In Figure 3, we present our results for the T T R metric. We see that the training time of the R L S is heavily affected by n. As n increases, the T T R increases as well. When n 500 , the training time for the R L S is 11.0 min, approximately. This means that, in case where the n is high, the R L S pays a high cost for the training process making the scheme inappropriate to be frequently initiated. However, if n is stable, the R L S is not affected by | F | . Consequently, the discussed behaviour causes problems in highly dynamic environments where queries and Q P s characteristics change continually. On the other hand, in Figure 4, we see that the C S requires at most 10.5 s (approximately) to derive the initial set of clusters over the historical values. It should be noted that this process is executed only once before the Q C starts to take assignment decisions. After that, the Q C adopts the incremental clustering update process. In general, the C S performance is not affected either by n or by | F | . These results are very significant indicating that the clustering process does not mainly affect the performance of the scheme. Hence, the result of the C S could be combined with the result of the R L S in the sense that the C S has always up-to-date information while the R L S exhibits the optimality in the assignment process (this conclusion is derived based also on the upcoming discussion).
The C S exhibits the worst throughput compared to the R L S as presented in Figure 5. The performance of the C S is affected by the number of steps required to conclude an assignment. Recall that the C S derives/aggregates clusters information for every Q P and, accordingly, selects the Q P having the highest score. Hence, in the worst case, it requires n + 1 steps to derive an assignment. This is alleviated by performing the incremental clustering process only when new historical data are present. Re-clustering could be selected in pre-defined intervals (e.g., every week). In addition, the | F | affects the performance of both schemes as low | F | leads to high G . The higher the | F | , the lower the G becomes. The same observation stands for n. The lower the n is, the higher the G becomes.
In Figure 6 and Figure 7, we present our results related to the ω metric. In general, the C S results higher ω than the R L S . The difference is approximately 14% (in average). No safe conclusion could be derived when | F | = 50 (Figure 7). In this case, the two schemes exhibit similar behaviour concerning the ω results apart from the scenarios where n 50 , 100 . The R L S is favourite to high n as it can efficiently select the best Q P for concluding an assignment. It should be noted that the proposed schemes are evaluated against e.g., the minimum R T value in the set of available Q P s, however, both models pay attention, equally, to the entire set of parameters and not on a single one. It is obvious that paying attention on a single parameter forces us to adopt the D M and not a learning scheme that ‘combines’ multiple Q P s and queries characteristics.
Following the performance related to the ω metric, the C S exhibits similar performance as far as the ϵ metric concerns (Figure 8 and Figure 9). The C S behaves worse than the R L S especially when | F | = 10 . The average difference is 14% (approximately) while in the case of | F | = 50 , the C S performs better than the R L S when n 5 , 20 . When n = 20 , the difference is −14% (approximately) to the favour of the C S . In general, an average ϵ equal to 0.39 for the R L S and 0.40 for the C S is considered as low, keeping in mind that the proposed schemes select the Q P performing well for the entire set of parameters.
In Figure 10, we present our results related to the ϕ metric. Recall that ϕ depicts the average Q o R achieved by the proposed schemes. We observe that the R L S performs better than the C S , however, the difference is low. The average ϕ is kept close or over 0.6. In addition, as n increases, ϕ increases as well which means that high Q P s number lead to high Q o R . If we focus on Figure 11 and Figure 12, we observe that both models cover the maximum available Q o R when n is low. When n increases, the average Q o R increases as well, however, this is not the highest possible value among the entire set of Q P s. The R L S keeps the coverage above 50% for all the experimental scenarios. The average δ for all the experimental scenarios is 62% and 55% for the R L S and the C S , respectively.
In Figure 13 and Figure 14, we present our results for the λ metric. We observe that the difference between the two schemes is significant with the R L S exhibiting an average λ close to 0.10. In the C S case, the load of the selected Q P s decreases as the n increases. For high n, the C S is based on an extensive set of data and, thus, can conclude an efficient Q P s selection. The aforementioned results are not affected by number of families | F | .
The strength of the R L S is depicted by the t h e t a metric (Figure 15) while its weakness is related to the training time (as already discussed). In Figure 15, we see a comparison between the D M and the R L S (the C S has similar performance with the D M ). The D M , as n 100 , requires more steps to conclude an assignment while the R L S requires only 1.11 steps, in average. This adds a significant burden in the D M and the C S , however, both schemes deal with always ‘fresh’ data related to the queries and Q P s characteristics.
We perform an additional comparison between the proposed schemes and the D M related to the performance for each adopted metric. In Figure 16, we observe that the proposed schemes exhibit difference with the D M , however, if we take into consideration the entire set of metrics at the same time, our schemes exceed the average, for each one. For instance, it is better to select a Q P that exhibits ω = 0.4 , ϕ = 0.6 and λ = 0.1 instead of selecting a Q P that exhibits ω = 0.9 , ϕ = 0.9 and λ = 0.9 . In other words, our schemes aim to Q P s exhibiting acceptable results for the entire set of metrics at the same time and not to Q P s exhibiting the maximum performance to a single metric.
In Table 2, we report on the results related to the ζ metric. In these results, we pay attention on all the examined parameters at the same time. We observe that the R L S outperforms the C S no matter the number of the available Q P s. These results are mainly affected by the steps required to conclude the assignment. There is a trade off depicted by the up-to-date data considered in the C S instead of having low assignment time and slightly better performance of the R L S . The advantages of the one become disadvantages of the other. For instance, the R L S exhibits high training time and, thus, in scenarios where n is high, the training process cannot be realized frequently. This stands under the risk of deriving decisions over stale data in highly dynamic environments. However, the R L S leads to the best assignment though the entire set of parameters perspective.

6. Conclusions and Future Work

Smart public governance of Smart Cities should focus on the provision of novel ICT solutions to enhance the adopted technologies. Current Smart Cities initiatives involve the distribution of numerous devices in various locations to create an infrastructure that interacts with its environment. On of the most significant decisions for local authorities is how to manage the huge volumes of data produced by citizens and the underlying infrastructure in real time. Our model is built to support public authorities to create novel applications. We propose and provide the basis for building intelligent applications in Smart Cities, thus, to enhance the smart governance of modern Smart Cities. We define the notion of a Query Controller ( Q C ), a module that is responsible to handle the incoming queries and decide their assignment to a number of underlying Query Processors ( Q P s). We propose two schemes; one that is based on reinforcement leaning and another that is based on clustering. We build on top of the Q P s’ characteristics and provide schemes based on the Q-learning algorithm and the subtractive clustering. The former scheme requires a training process to derive a set of tables adopted to result the appropriate processor for each incoming query. The proposed schemes derive assignments (i.e., queries to processors) taking into consideration a number of parameters. For instance, these parameters can include the response time of each processor for each query family as well as the quality of result ( Q o R ) and the load of each Q P . A large set of experiments reveal the efficiency of the proposed model. We focus on the throughput of the Q C when working in a continuous query scenario and on the quality of the final outcome. The training time of the reinforcement learning scheme affects the performance of our model especially when we consider a high number of processors. However, training could not be executed frequently while can be based on techniques that reduce the training time. In addition, our results how that the proposed models can efficiently select the appropriate processor for each query and reduce the required time for returning the final response to the application. We compare our models with a deterministic scheme and show that they perform better than it. Our models take into consideration the entire set of parameters, thus, they efficiently manage the trade-off in the decision making (their decisions are not based on single parameters but in the entire set).
In the first place of our future research agenda is the definition of an adaptive learning model that minimizes the required training time. This gives the opportunity of having multiple initiations of the proposed learning scheme without burdening the performance, thus, alleviating the required effort of local authorities. Another model that efficiently combines the results of the proposed learning schemes is necessary. Such a combination could increase the performance of the Q C . Each scheme derives decisions through a specific perspective. Finally, the definition of a more complex function for deriving the future estimate of the Q P s load is another research issue.
The reason is that a typical average cannot cover the hidden properties in predicting the Q P s load.

Author Contributions

The authors contributed equally to the Cognitive Task Analysis, the representation design, and the writing of the paper.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Scholl, H.; Scholl, M. Smart Governance: A Roadmap for Research and Practice. In Proceedings of the iConference Summary, Berlin, Germany, 4–7 March 2014; pp. 163–176. [Google Scholar]
  2. Meijer, A.; Bolivar, M. Governing the Smart City: Scaling-Up the Search for Socio-Techno Synergy. In Proceedings of the 2013 EGPA Conference, Scotland, UK, 11–13 September 2013. [Google Scholar]
  3. Bolivar, M.P.R. Smart Cities: Big Cities, Complex Governance? In Transforming City Governments for Successful Smart Cities; Public Administration and Information Technology 8; Springer International Publishing: Berlin, Germany, 2015. [Google Scholar]
  4. Kresin, C. Design Rules for Smarter Cities. In ‘Smart Citizens’ Future Everything; Hemmet, D., Townsend, A., Eds.; Future Everything Publications: Manchester, UK, 2013; pp. 51–54. [Google Scholar]
  5. Silva, B.N.; Khan, M.; Han, K. Big Data Analytics Embedded Smart City Architecture for Performance Enhancement through Real-Time Data Processing and Decision-Making. Wirel. Commun. Mobile Comput. 2017, 2017, 9429676. [Google Scholar]
  6. Abadi, D.J.; Carney, D.; Cetintemel, U.; Cherniack, M.; Convey, C.; Lee, S.; Stonebraker, M.; Tatbul, N.; Zdonik, S.B. Aurora: A New Model and Architecture for Data Stream Management. VLDB J. 2003, 12, 120–139. [Google Scholar] [CrossRef]
  7. Arasu, A.; Babcock, B.; Babu, S.; Cieslewicz, J.; Datar, M.; Ito, K.; Motwani, R.; Srivastava, U.; Widom, J. STREAM: The Stanford Data Stream Management System; Part of the series Data-Centric Systems and Applications; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
  8. Chandrasekaran, S.; Franklin, M.J.P. Soup: A system for streaming queries over streaming data. VLDB J. 2003, 12, 140–156. [Google Scholar] [CrossRef]
  9. Cranor, C.; Johnson, T.; Spataschek, O.; Shkapenyuk, V. Gigascope: A Stream Database for Network Applications. In Proceedings of the International Conference on Management of Data and Symposium on Principles Database and Systems (SIGMOD), San Diego, CA, USA, 9–12 June 2003. [Google Scholar]
  10. Hammad, M.A.; Ghanem, T.M.; Aref, W.G.; Elmagarmid, A.K.; Mokbel, M.F. Efficient Pipelined Execution of Sliding-Window Queries Over Data Streams; Technical Report; Department of Computer Science, Purdue University: West Lafayette, IN, USA, 2003. [Google Scholar]
  11. Motwani, R.; Widom, J.; Arasu, A.; Babcock, B.; Babu, S.; Datar, M.; Manku, G.S.; Olston, C.; Rosenstein, J.; Varma, R. Query Processing, Approximation, and Resource Management in a Data Stream Management System. In Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 5–8 January 2003. [Google Scholar]
  12. Yao, Y.; Gehrke, J. The Cougar Approach to In-Network Query Processing in Sensor Networks. SIGMOD Rec. 2002, 31, 9–18. [Google Scholar] [CrossRef]
  13. Mokbel, M.; Xiong, X.; Hammad, M.; Aref, W. Continuous Query Processing of Spatio-Temporal Data Streams in PLACE. Geoinformatics 2005, 9, 4. [Google Scholar] [CrossRef]
  14. Funellblack, Supporting Multiple Query Processors. Available online: (accessed on 20 March 2017).
  15. Oracle Parallel Server Concepts and Administration. Available online: (accessed on 20 March 2017).
  16. Ciaccia, P.; Patella, M. Approximate Similarity Queries: A Survey; University of Bolognia: Bologna, Italy, 2001. [Google Scholar]
  17. Ramakrishnan, G.; Uma Maheswari, S. Novel Approach for Predicting Difficult Keyword Queries over Databases using Effective Ranking. Int. J. Eng. Res. Technol. 2015, 4. [Google Scholar] [CrossRef]
  18. Zhang, W.; Li, J. Processing Probabilistic Range Query over Imprecise Data Based on Quality of Result. In APWeb Workshops; LNCS; Springer: Berlin/Heidelberg, Germany, 2006; pp. 441–449. [Google Scholar]
  19. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
  20. Bishop, C. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  21. Alpaydin, E. Introduction to Machine Learning, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2010. [Google Scholar]
  22. Kolomvatsos, K.; Hadjiefthymiades, S. Learning the Engagement of Query Processors for Intelligent Analytics. Springer Appl. Intell. J. 2016, 1–17. [Google Scholar] [CrossRef]
  23. Singh, S.; Singh, N. Big Data Analytics. In Proceedings of the International Conference on Communication, Information, Computing Technology, Mumbai, India, 19–20 October 2012. [Google Scholar]
  24. Herodotou, H.; Lim, H.; Luo, G.; Borisov, N.; Dong, L.; Cetin, F.B.; Babu, S. Starfish: A Self-tuning System for Big Data Analytics. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 9–12 January 2011. [Google Scholar]
  25. Agarwal, S.; Milner, H.; Kleiner, A.; Talwalkar, A.; Jordan, M.; Madden, S.; Mozafari, B.; Stoica, I. Knowing When You’re Wrong: Building Fast and Reliable Approximate Query Processing Systems. In Proceedings of the ACM SIGMOD Conference, Snowbird, UT, USA, 22–27 June 2014. [Google Scholar]
  26. Chaudhuri, S.; Das, G.; Srivastava, U. Effective use of block-level sampling in statistics estimation. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, Paris, France, 13–18 June 2004. [Google Scholar]
  27. Hellerstein, J.M.; Avnur, R. Informix under control: Online query Processing. Data Min. Knowl. Discov. J. 2000, 4, 281–314. [Google Scholar] [CrossRef]
  28. Pansare, N.; Borkar, V.R.; Jermaine, C.; Condie, T. Online aggregation for large MapReduce jobs. In Proceedings of the 37th International Conference on Very Large Data Bases (VLDB), Seattle, WA, USA, 29 August–3 September 2011. [Google Scholar]
  29. Doucet, A.; Briers, M.; Senecal, S. Efficient block sampling strategies for sequential Monte Carlo methods. J. Comput. Graphical Stat. 2006. [Google Scholar] [CrossRef]
  30. Raman, V.; Raman, B.; Hellerstein, J.M. Online dynamic reordering for interactive data processing. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), Scotland, UK, 7–10 September 1999. [Google Scholar]
  31. Chandramouli, B.; Goldstein, J.; Quamar, A. Scalable Progressive Analytics on Big Data in the Cloud. Proc. VLDB Endow. 2013, 6, 1726–1737. [Google Scholar] [CrossRef]
  32. Condie, T.; Conway, N.; Alvaro, P.; Hellerstein, J.M.; Elmeleegy, K.; Sears, R. MapReduce online. In Proceedings of the 7th Conference on Networked Systems Design and Implementation, San Jose, CA, USA, 28–30 April 2010. [Google Scholar]
  33. Logothetis, D.; Yocum, K. Ad-hoc Data Processing in the Cloud. Proc. VLDB Endow. 2008, 1, 1472–1475. [Google Scholar] [CrossRef]
  34. Jermaine, C.; Arumugam, S.; Pol, A.; Dobra, A. Scalable approximate query processing with the DBO engine. In Proceedings of the ’07 International Conference on Management of Data (SIGMOD), Beijing, China, 11–14 June 2007. [Google Scholar]
  35. Bakhtiar, A.; Joan, L. Sketch of Big Data Real-Time Analytics Model. In Proceedings of the The Fifth International Conference on Advances in Information Mining and Management, Brussels, Belgium, 21–26 June 2015. [Google Scholar]
  36. Vennila, V.; Kannan, A.R. Symmetric Matrix-based Predictive Classifier for Big Data computation and information sharing in Cloud. Comput. Electr. Eng. 2016, 56, 831–841. [Google Scholar] [CrossRef]
  37. Hassani, M.; Spaus, P.; Cuzzocrea, A.; Seidi, T. I-HASTREAM: Density-Based Hierarchical Clustering of Big Data Streams and Its Application to Big Graph Analytics Tools. In Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Cartagena, Colombia, 16–19 May 2016. [Google Scholar]
  38. Ma, S.; Liang, Z. Design and Implementation of Smart City Big Data Processing Platform Based on Distributed Architecture. In Proceedings of the 10th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), Taipei, Taiwan, 24–27 November 2015. [Google Scholar]
  39. Arasteh, H.; Hosseinnezhad, V.; Loia, V.; Tommasetti, A.; Troisi, O.; Shafie-khah, M.; Siano, P. IoT-based Smart Cities: A Survey. In Proceedings of the IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC), Florence, Italy, 6–8 June 2016. [Google Scholar]
  40. Eiman, A.N.; Hind, A.N.; Nader, M.; Jameela, A.J. Applications of big data to smart cities. J. Internet Ser. Appl. 2015, 6, 25. [Google Scholar]
  41. Kumar, S.; Prakash, A. Role of Big Data and Analytics in Smart Cities. Int. J. Sci. Res. 2016, 5, 12–23. [Google Scholar]
  42. Sun, Y.; Song, H.; Jara, A.; Bie, R. Internet of Things and Big data Analytics for Smart Cities and Connected Communities. IEEE Access 2016, 4, 766–773. [Google Scholar] [CrossRef]
  43. Spatti, D.H.; Bartocci Lobini, L.H. Computational Tools for Data Processing in Smart Cities. In Smart Cities Technologies; InTech: Rijeka, Croatia, 2016. [Google Scholar]
  44. Bonino, D.; Rizzo, F.; Pastrone, C.; Soto, J.A.C.; Ahlsen, M.; Axling, M. Block-based realtime big-data processing for smart cities. In Proceedings of the IEEE International Smart Cities Conference (ISC2), Trento, Italy, 12–15 September 2016. [Google Scholar]
  45. Lei, C.; Zhuang, Z.; Rundensteiner, E.A.; Eltabakh, M. Redoop Infrastructure for Recurring Big Data Queries. In Proceedings of the 40th International Conference on very Large Data Bases (VLDB), Hangzhou, China, 1–5 September 2014. [Google Scholar]
  46. Bu, Y.; Howe, B.; Balazinska, M.; Ernst, M.D. The HaLoop approach to large-scale iterative data analysis. VLDB J. 2012, 21, 169–190. [Google Scholar] [CrossRef]
  47. Ekanayake, J.; Li, H.; Zhang, B.; Gunarathne, T.; Bae, S.-H.; Qiu, J.; Fox, G. Twister: A runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, IL, USA, 21–25 June 2010; pp. 810–818. [Google Scholar]
  48. Olston, C.; Chiou, G.; Chitnis, L.; Liu, F.; Han, Y.; Larsson, M.; Neumann, A.; Rao, V.B.N.; Karasubramanian, V.; Seth, S.; et al. Nova: continuous pig/hadoop workflows. In Proceedings of the SIGMOD Conference, Athens, Greece, 12–16 June 2011; pp. 1081–1090. [Google Scholar]
  49. Al-Jarrah, O.Y.; Yoo, P.D.; Muhaidat, S.; Karagiannidis, G.K.; Taha, K. Efficient Machine Learning for Big Data: A Review. Big Data Res. 2015, 2, 87–93. [Google Scholar] [CrossRef]
  50. Zhou, L.; Pan, S.; Wang, J.; Vasilakos, A.V. Machine learning on big data: Opportunities and challenges. Neurocomputing 2017, 237, 350–361. [Google Scholar] [CrossRef]
  51. Jang, J.-S.R.; Sun, C.T.; Mizutani, E. Neuro-Fuzzy and Soft Computing—A Computational Approach to Learning and Machine Intelligence; Prentice Hall: Upper Saddle River, NJ, USA, 1997. [Google Scholar]
  52. Boulougaris, G.; Kolomvatsos, K.; Hadjiefthymiades, S. Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques. In Proceedings of the 2010 IEEE World Congress on Computational Intelligence (WCCI 2010), IJCNN, Barcelona, Spain, 18–23 July 2010; pp. 1166–1173. [Google Scholar]
  53. Manyika, J.; Durrant-Whyte, H. Data Fusion and Sensor Management: A Decentralized Information-Theoretic Approach; Ellis Horwood: New York, NY, USA; London, UK, 1994. [Google Scholar]
  54. Charikar, M.; Chekuri, C.; Feder, T.; Motwani, R. Incremental clustering and dynamic infomation retrieval. In Proceedings of the ACM 29th Annual Symposium on Theory of Computing, El Paso, TX, USA, 4–6 May 1997. [Google Scholar]
Figure 1. The architecture of our model.
Figure 1. The architecture of our model.
Informatics 04 00016 g001
Figure 2. The proposed Q-tables (the ‘assign’ column depicts the ‘assignment’ action).
Figure 2. The proposed Q-tables (the ‘assign’ column depicts the ‘assignment’ action).
Informatics 04 00016 g002
Figure 3. The R L S training time.
Figure 3. The R L S training time.
Informatics 04 00016 g003
Figure 4. The C S initial set of clusters creation time.
Figure 4. The C S initial set of clusters creation time.
Informatics 04 00016 g004
Figure 5. Our results for the throughput of the R L S and the C S .
Figure 5. Our results for the throughput of the R L S and the C S .
Informatics 04 00016 g005
Figure 6. R L S and C S results for the ω metric ( | F | = 10 ).
Figure 6. R L S and C S results for the ω metric ( | F | = 10 ).
Informatics 04 00016 g006
Figure 7. R L S and C S results for the ω metric ( | F | = 50 ).
Figure 7. R L S and C S results for the ω metric ( | F | = 50 ).
Informatics 04 00016 g007
Figure 8. R L S and C S results for the ϵ metric ( | F | = 10 ).
Figure 8. R L S and C S results for the ϵ metric ( | F | = 10 ).
Informatics 04 00016 g008
Figure 9. R L S and C S results for the ϵ metric ( | F | = 50 ).
Figure 9. R L S and C S results for the ϵ metric ( | F | = 50 ).
Informatics 04 00016 g009
Figure 10. R L S and C S results for the ϕ metric.
Figure 10. R L S and C S results for the ϕ metric.
Informatics 04 00016 g010
Figure 11. R L S and C S results for the δ metric ( | F | = 10 ).
Figure 11. R L S and C S results for the δ metric ( | F | = 10 ).
Informatics 04 00016 g011
Figure 12. R L S and C S results for the δ metric ( | F | = 50 ).
Figure 12. R L S and C S results for the δ metric ( | F | = 50 ).
Informatics 04 00016 g012
Figure 13. R L S and C S results for the λ metric ( | F | = 10 ).
Figure 13. R L S and C S results for the λ metric ( | F | = 10 ).
Informatics 04 00016 g013
Figure 14. R L S and C S results for the λ metric ( | F | = 50 ).
Figure 14. R L S and C S results for the λ metric ( | F | = 50 ).
Informatics 04 00016 g014
Figure 15. θ results for the R L S .
Figure 15. θ results for the R L S .
Informatics 04 00016 g015
Figure 16. Comparison results with the D M .
Figure 16. Comparison results with the D M .
Informatics 04 00016 g016
Table 1. Predictors List.
Table 1. Predictors List.
PredictorAttributesShort Description
CycleWindow sizeIt returns the same data in a specific window
Single ExponentialSmoothing factorIt consists of a weighted average of past values
Double ExponentialSmoothing factorIt uses a simple linear regression equation
Seasonal NaiveSeasonal periodEach forecast is equal to the last observed value from the same season in the past
DriftTime windowTime window
Extrapolation-It estimates the value of a variable on the basis of its relation with another variable
Additive Holt WintersLevel, Slope, SeasonIt deals with time series containing both trend and seasonal variations
Multiplicative Holt WintersLevel, Slope, SeasonIt deals with time series containing both trend and seasonal variations
Geometric Moving AverageWindow sizeIt smooths past data by geometrically averaging over a specified period and projects forward in time
Moving AverageWindow sizeIt smooths past data by arithmetically averaging over a specified period and projects forward in time
Parabolic Moving AverageTrend valueIt is a weighted moving average with weights that form a parabolic shape
Triangular Moving AverageWindow sizeIt is a weighted moving average with weights that form a triangular shape
Neural NetworkNetwork nodesIt uses a neural network for estimating future values
LinearCoefficientsIt uses the Levinson-Durbin algorithm for linear prediction
Linear RegressionCoefficientsIt fits the time series to a straight line and projects forward in time
PolynomialCoefficientsIt fits a polynomial equation to the data and projects forward in time
Rounded AverageWindow sizeIt returns the rounded average of past values defined in a specific window
Table 2. ζ results for different n.
Table 2. ζ results for different n.

Share and Cite

MDPI and ACS Style

Kolomvatsos, K.; Anagnostopoulos, C. Reinforcement Learning for Predictive Analytics in Smart Cities. Informatics 2017, 4, 16.

AMA Style

Kolomvatsos K, Anagnostopoulos C. Reinforcement Learning for Predictive Analytics in Smart Cities. Informatics. 2017; 4(3):16.

Chicago/Turabian Style

Kolomvatsos, Kostas, and Christos Anagnostopoulos. 2017. "Reinforcement Learning for Predictive Analytics in Smart Cities" Informatics 4, no. 3: 16.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop