You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

1 January 2021

A Cartesian Genetic Programming Based Parallel Neuroevolutionary Model for Cloud Server’s CPU Usage Prediction

,
,
,
,
and
1
Department of Computer Engineering, Bahria University Islamabad, Islamabad 44000, Pakistan
2
Electrical and Computer Engineering Department, COMSATS University Islamabad Attock Campus, Punjab 43600, Pakistan
3
Intelligent System Design Group, National Centre of AI-UETP, University of Engineering and Technology, Peshawar 25120, Pakistan
4
Department of Information and Communication Engineering, Inha University, Incheon 22212, Korea
This article belongs to the Special Issue Applications for Smart Cyber Physical Systems

Abstract

Cloud computing use is exponentially increasing with the advent of industrial revolution 4.0 technologies such as the Internet of Things, artificial intelligence, and digital transformations. These technologies require cloud data centers to process massive volumes of workloads. As a result, the data centers consume gigantic amounts of electrical energy, and a large portion of data center electrical energy comes from fossil fuels. It causes greenhouse gas emissions and thus ensuing in global warming. An adaptive resource utilization mechanism of cloud data center resources is vital to get by with this huge problem. The adaptive system will estimate the resource utilization and then adjust the resources accordingly. Cloud resource utilization estimation is a two-fold challenging task. First, the cloud workloads are sundry, and second, clients’ requests are uneven. In the literature, several machine learning models have estimated cloud resources, of which artificial neural networks (ANNs) have shown better performance. Conventional ANNs have a fixed topology and allow only to train their weights either by back-propagation or neuroevolution such as a genetic algorithm. In this paper, we propose Cartesian genetic programming (CGP) neural network (CGPNN). The CGPNN enhances the performance of conventional ANN by allowing training of both its parameters and topology, and it uses a built-in sliding window. We have trained CGPNN with parallel neuroevolution that searches for global optimum through numerous directions. The resource utilization traces of the Bitbrains data center is used for validation of the proposed CGPNN and compared results with machine learning models from the literature on the same data set. The proposed method has outstripped the machine learning models from the literature and resulted in 97% prediction accuracy.

1. Introduction

Demand for cloud computing applications upsurges with the advent of new technologies like the Internet of Things and smart cities. According to Gartner’s survey, the mandate for IaaS services will be sturdy in the future (https://www.gartner.com/doc/3849464/survey-analysis-impact-iaas-paas). The projected per second Internet traffic spawned by the data center in 2021 will be 655,864 Giga Bytes (https://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud). The profoundly loaded data centers will consume colossal volumes of energy. In a data center, the information technology (IT) equipment such as communication links, switching and aggregation elements, servers consume 40% of the total data center energy [1,2], and servers consume 27% of the data center energy [3]. A cloud data center server remains idle from 70% to 80% of the time [4]. For energy saving in an idle time of the server, investigators in [5] and [6,7] have proposed dynamic voltage and frequency scaling (DVFS) and hot plugging of CPU cores. The DVFS and hot plugging mechanisms implement scaling reactively without using any intelligence/knowledge about resource usage or the workload patterns and trends. The energy conservation capability of DVFS and hot plugging mechanisms can be boosted by using CPU usage estimation/prediction beforehand the scaling of frequency or cores [7,8].
Cloud servers execute diverse workloads with diverse CPU (Central Processing Unit) requirements. On the other hand, clients’ requests reach at uneven interludes to cloud servers. Consequently, the diverse nature of workloads and the irregular arrival of clients’ requests make CPU usage prediction a puzzling task [9]. We propose a Cartesian genetic programming (CGP) based parallel neuroevolutionary prediction model to solve the CPU usage prediction problem.
Neuroevolution is a technique used for evolving parameters of a neural network using evolutionary operators such as mutation and crossover. A neural network is encoded into strings called chromosomes, and each substring is called a gene. The encoded genes of the chromosome are evolved with mutation/crossover to form new chromosomes (offspring). These chromosomes are evaluated for fitness, and the best one is selected as the parent for producing next-generation offspring. This process of generation, evaluation, and the selection continues until either producing the desired solution or reaching the maximum number of generations/iterations [10].
In conventional neuroevolution, the topology of the neural network is fixed, and only the weights of the network are evolved. In the proposed technique, we use parallel neuroevolution varying weights, topology, and the number of neurons in the network. It boosts the learnability of the network. Moreover, we use diverse initialized seeds to avoid local optima and to enhance prediction accuracy. The architecture uses a sliding window that averages the predicted outputs. Technical contributions are in resource usage estimation accuracy, improved learnability, and escaping from local optima. We highlight our technical contributions as follows:
  • We present the mathematical model of CGP based neural network with parameters and hyperparameters;
  • We evolve synaptic weights, topology, and the number of neurons for boosting learnability;
  • We conduct multiple search path optimization to avoid local optima;
  • We use a sliding window-based parallel architecture that makes several parallel predictions. These predictions are averaged for improving accuracy.

3. CGP-Based Neuroevolutionary Neural Network (CGPNN)

Cartesian genetic programming (CGP) is a genetic programming method in which the genetic code of the program is denoted by integers placed in the form of a directed graph. Programs signified in the form of a graph can be applied to many problems in electronic circuits, scheduling and neural networks [55].
Cartesian genetic programming-based neuroevolutionary neural network (CGPNN) is an ANN that uses the basic principles of CGP. A CGPNN neuron is shown in (1) that has three types of parameters. The first type of parameter is Ψ that triggers/disables a CGPNN neuron (i.e., defines neural plasticity). The second type parameter is Φ , which delineates the plasticity of synaptic connections of a neuron. The 3rd type of parameter is the weight θ of the synaptic connection of a neuron. In (2), we present a mathematical model of the linear combination of inputs, synaptic connections plasticity and synaptic weights used by sigmoid hypothesis h .
h Ψ Φ θ ( x ) = Ψ ( 1 1 + e θ T ( Φ x ) ) ,   Φ = [ Φ 0 0 0 0 Φ 1 0 0 0 Φ 2 ]   3 × 3
x = [ x 0 x 1 x 2 ] 3 × 1   ,   θ = [ θ 0 θ 1 θ 2 ] 3 × 1
where Ψ { 0 , 1 } for single neuron it is scalar, Φ   3 × 3 matrix and θ 3 × 1 matrix for x 3 × 1 .
h Ψ Φ θ ( x ) = Ψ × h ( θ 0 Φ 0 x 0 + θ 1 Φ 1 x 1 + θ 2 Φ 2 x 2 )
where h is the sigmoid hypothesis defined in (1).
In CGP, the processing units (nodes) are placed in the Cartesian plane. The processing units may use any of the microscopic functions like AND, OR, XOR, Multiplexer, etc. Whereas in CGPNN, the processing units are artificial neurons as an alternative to the microscopic functions of CGP. Besides neurons, all doctrines and directions of the CGP are followed by CGPNN.
We present a four-layer ANN in Figure 1 and then renovate this network to CGPNN, as shown in Figure 2. The ANN model, displayed in Figure 2, presents a basic architecture of ANN that has four layers, of which two middle layers are hidden layers, one input and output layer, respectively. The parameters ( θ s ) and activation function(s) are not shown here for providing a modest illustration of the ANN. The input layer consists of two system inputs x 1 and x 2 and a bias input x 0 (where input x 0 = 1 ). Layers 2 and 3 are hidden layers; each has two processing neurons and bias input. In layer 2, c 0 is the bias input while c 1 and c 2 are processing neurons. In layer 3, d 0 is the bias input while d 1 and d 2 are processing neurons. Layer 4 has one processing neuron o 1 that outputs the hypothesis ( h θ ( x ) ) result. Where h θ ( x ) is a hypothesis h that maps the given inputs x 0 , x 1 and x 2 onto the given output y by fine-tuning the parameters/weights ( θ s ) .
Figure 1. A four-layered artificial neural network: layer 1 consists of bias input x 0 and two system inputs x 1 and x 2 . Layer 2 consists of a bias input c 0 and two processing neurons c 1 and c 2 . Layer 3 consists of a bias input d 0 and two processing neurons d 1 and d 2 . Layer 4 has one processing neuron o 1 that generates the output of h θ ( x ) .
Figure 2. Transformed model of 4 layers ANN of Figure 1, into CGPNN.
A general architecture of CGPNN is shown in Figure 2, which is transmuted from the ANN of Figure 1. The processing neurons (i.e., c 1 ,   c 2 ,   d 1 ,   d 2 ,   and   o 1 ) are placed in the Cartesian plane where the vertical coordinate is one (i.e., one row), and horizontal coordinates are five (i.e., five columns, first four columns have hidden layers neurons and the 5th column has output neuron). The number of neurons in hidden layers’ columns and output layer column may vary. All the bias inputs are placed in the input layer along with inputs according to the CGP convention.
In CGPNN, the number of hidden layers’ neurons contributing to the production of hypothesis estimate is not fixed in contrast to the ANN. In Figure 3, there are four hidden layers of neurons (i.e., c 1 ,   c 2 ,   d 1 , and d 2 ), the total possible combinations are fifteen, so fifteen different networks may be spawned based on varying the number and location of neurons. The possible networks of neurons may be either [ c 1 ,   c 2 ,   d 1 ,   d 2 ] or [ c 1 ,   c 2 ,   d 1 ] or [ c 1 ,   c 2 ,   d 2 ] or [ c 2 ,   d 1 ,   d 2 ] or [ c 1 ,   d 1 ,   d 2 ] or [ c 1 ,   c 2 ] or [ c 1 ,   d 1 ] or [ c 1 ,   d 2 ] or [ c 2 ,   d 1 ] or [ c 2 ,   d 2 ] or [ d 1 ,   d 2 ] or [ c 1 ] or [ c 2 ] or or [ d 2 ] . Thus, there are fifteen possible networks to be generated from the network shown in Figure 3.
Figure 3. The genotype of a generic CGPNN neuron.
The matrix ɴ lists the neurons of the hidden layer as given in (3). In (3), we use parameter matrix Ψ that describes neural plasticity of the network by triggering or disabling a neuron or neurons in the hidden layers of the network. Whereas Ψ j is used to trigger/disable a neuron in the j th layer. Thus, neural plasticity parameters matrix Ψ is multiplied with matrix ɴ to select the neurons for a new configuration of the network.
ɴ = [ c 1 c 2 d 1 d 2 ] ,   Ψ = [ Ψ 2 0 0 0   0 Ψ 3 0 0   0 0 Ψ 4 0   0 0 0 Ψ 5 ]
where { Ψ 2 ,   Ψ 3 ,   Ψ 4 ,   Ψ 5 } { 0 , 1 } .
Similarly, the synaptic inputs for hidden layer neurons and output layer neurons of CGPNN are not fixed in contrast to the ANN. After selecting the neurons, as mentioned earlier, the synaptic inputs for each neuron are then selected by evolution. Layer 2 can get inputs from layer 1 (except c 0 and d 0 ). Layer 3 can get inputs from both layer 1 and layer 2 (except c 0 and d 0 ). Layer 4 can get inputs (except x 0 and d 0 ) from all previous layers (i.e., layers 1, 2, 3). Layer 5 can have inputs from 1st, 2nd, 3rd, and 4th layers (except x 0 and d 0 ). Similarly, the output layer can get inputs from 1st, 2nd, 3rd, 4th, and 5th layers (except x 0 and c 0 ). The matrices I 2 , I 3 ,   I 4 ,   I 5 , and I 6 given in (4), represent the possible inputs for 2nd, 3rd, 4th, 5th, and 6th layers, respectively. Where I j represents the input matrix for j th layer neurons. The parameter α i j in (4), shows the output of i th neuron in j th layer.
In ANN, the number of inputs for each processing neuron is fixed, but in CGPNN, it is not obligatory that a neuron will get all these synaptic inputs simultaneously. As in ANN of Figure 1, each processing neuron gets three inputs, but here in CGPNN, each processing neuron may get any amalgamation of inputs from the given list of inputs. Thus, we define synaptic input parameters in (5). The synaptic inputs plasticity parameters are listed in matrices Φ 2 ,   Φ 3 ,   Φ 4 ,   Φ 5 and Φ 6 for synaptic inputs matrices I 2 , I 3 ,   I 4 ,   I 5 , and I 6 , respectively. The synaptic inputs plasticity parameters matrices Φ 2 ,   Φ 3 ,   Φ 4 ,   Φ 5 and Φ 6 defines the plasticity of synaptic inputs listed in matrices I 2 , I 3 ,   I 4 ,   I 5 , and I 6 , respectively. Where Φ j represents synaptic plasticity matrix for j th layer neurons inputs. The parameter Φ k j defines plasticity of k th input in j th layer.
I 2 = [ x 0 x 1 x 2 ] , I 3 = [ x 0 x 1 x 2 α 1 2 ] , I 4 = [ x 1 x 2 c 0 α 1 2 α 1 3 ] , I 5 = [ x 1 x 2 c 0 α 1 2 α 1 3 α 1 4 ] , I 6 = [ x 1 x 2 α 1 2 α 1 3 d 0 α 1 4 α 1 5 ]
Φ 2 = [ Φ 1 2 0 0   0 Φ 2 2 0   0 0 Φ 3 2 ] ,   Φ 3 = [ Φ 1 3 0 0 0   0 Φ 2 3 0 0   0 0 Φ 3 3 0   0 0 0 Φ 4 3 ]   Φ 4 = [ Φ 1 4 0 0 0 0   0 Φ 2 4 0 0 0   0 0 Φ 3 4 0 0   0 0 0 Φ 4 4 0   0 0 0 0 Φ 5 4 ] Φ 5 = [ Φ 1 5 0 0 0 0 0   0 Φ 2 5 0 0 0 0   0 0 Φ 3 5 0 0 0   0 0 0 Φ 4 5 0 0   0 0 0 0 Φ 5 5 0   0 0 0 0 0 Φ 6 5 ] Φ 6 = [ Φ 1 6 0 0 0 0 0 0   0 Φ 2 6 0 0 0 0 0   0 0 Φ 3 6 0 0 0 0   0 0 0 Φ 4 6 0 0 0   0 0 0 0 Φ 5 6 0 0   0 0 0 0 0 Φ 6 6 0   0 0 0 0 0 0 Φ 7 6 ]
where { Φ 2 ,   Φ 3 ,   Φ 4 ,   Φ 5 ,   Φ 6 } { 0 , 1 } .
The parameters Ψ and Φ j delineate plasticity of CGPNN two-fold, first by triggering/disabling neuron(s), also called neural plasticity, and second by triggering/disabling of synaptic inputs, also called synaptic plasticity. Here we define the third type of parameter that is common to both ANNs and CGPNN, i.e., the weights matrix of neuron inputs. The weight matrix for j th layer neurons inputs are represented by θ j . Where θ k j represents the weight of k th input in the j th layer. The weights matrices are given in (6).
θ 2 = [ θ 1 2 θ 2 2 θ 3 2 ] ,   θ 3 = [ θ 1 3 θ 2 3 θ 3 3 θ 4 3 ]   , θ 4 = [ θ 1 4 θ 2 4 θ 3 4 θ 4 4 θ 5 4 ]   , θ 5 = [ θ 1 5 θ 2 5 θ 3 5 θ 4 5 θ 5 5 θ 6 5 ]   , θ 6 = [ θ 1 6 θ 2 6 θ 3 6 θ 4 6 θ 5 6 θ 6 6 θ 7 6 ]
After defining all three types of parameters of CGPNN, we now formulate layer-wise outputs. Here h   i   j is the sigmoid hypothesis of i th neuron in j th layer and α i j represents output of i th neuron in j th layer.
The output of the 2nd layer of CGPNN shown in Figure 2 is represented by α 1 2 and is given in (7). The output of the 3rd layer is given in (8) that is characterized by α 1 3 . Layer 4 output is defined in (9), and the 5th layer’s output is given in (10). The CGPNN output is represented by h Ψ Φ θ ( x ) as given in (11). As in (11) the neural plasticity operator for output layer neuron, i.e.,   Ψ 6 = 1 . The hypothesis that maps the inputs of the output layer (i.e., the 6th layer) with the desired output y is represented by h 1 6 . This hypothesis is h 1 6 a linear combination of inputs when CGPNN is used for solving a regression problem. In the case of the classification problem, h 1 6 is the sigmoid function of its inputs that is defined in (11).
α 1 2 = Ψ 2 × h   1   2 ( θ 1 2   Φ 1 2   x 0   + θ 2 2   Φ 2 2   x 1 +   θ 3 2   Φ 3 2   x 2 )
α 1 3 = Ψ 3 × h   1   3 ( θ 1 3   Φ 1 3   x 0   + θ 2 3   Φ 2 3   x 1 +   θ 3 3   Φ 3 3   x 2 +   θ 4 3   Φ 4 3   α 1 2 )
α 1 4 = Ψ 4 × h   1   4 ( θ 1 4   Φ 1 4   x 1   + θ 2 4   Φ 2 4   x 2 +   θ 3 4   Φ 3 4   c 0 +   θ 4 4   Φ 4 4   α 1 2 +   θ 5 4   Φ 5 4   α 1 3 )
α 1 5 = Ψ 5 × h   1   5 ( θ 1 5   Φ 1 5   x 1   + θ 2 5   Φ 2 5   x 2 +   θ 3 5   Φ 3 5   c 0 +   θ 4 5   Φ 4 5   α 1 2 +   θ 5 5   Φ 5 5   α 1 3 +   θ 6 5   Φ 6 5   α 1 4 )
h Ψ Φ θ ( x ) = Ψ 6 × h   1   6 ( θ 1 6   Φ 1 6   x 1   + θ 2 6   Φ 2 6   x 2 +   θ 3 6   Φ 3 6   α 1 2 +   θ 4 6   Φ 4 6   α 1 3 +   θ 5 6   Φ 5 6   d 0 +   θ 6 6   Φ 6 6   α 1 4 +   θ 7 6   Φ 7 6   α 1 5 )
where Ψ 6 = 1   for   output   layer   neuron
h 1 6 ( z ) = { z for   regression   problems ( 1 1 + e z ) for   classification   problems
The mathematical model of CGPNN in terms of Ψ ,   Φ , θ and network inputs x 0 ,   x 1 ,   x 2 ,   d 0 ,   d 1 and d 2 by substituting for α 1 2 ,   α 1 3 ,   α 1 4 and α 1 5 from (7)–(10) in (11), is given in (12).
h Ψ Φ θ ( x ) = Ψ 6 × h   1   6 [ θ 1 6   Φ 1 6   x 1   + θ 2 6   Φ 2 6   x 2 +   θ 3 6   Φ 3 6   { Ψ 2 × h   1   2 ( θ 1 2   Φ 1 2   x 0   + θ 2 2   Φ 2 2   x 1 +   θ 3 2   Φ 3 2   x 2 ) }   +   θ 4 6   Φ 4 6   { Ψ 3 × h   1   3 ( θ 1 3   Φ 1 3   x 0   + θ 2 3   Φ 2 3   x 1 +   θ 3 3   Φ 3 3   x 2 +   θ 4 3   Φ 4 3   ( Ψ 2 × h   1   2 ( θ 1 2   Φ 1 2   x 0   + θ 2 2   Φ 2 2   x 1 +   θ 3 2   Φ 3 2   x 2 ) ) ) } +   θ 5 6   Φ 5 6 d 0 +   θ 6 6   Φ 6 6   { Ψ 4 × h   1   4 ( θ 1 4   Φ 1 4   x 1   + θ 2 4   Φ 2 4   x 2 +   θ 3 4   Φ 3 4   c 0 +   θ 4 4   Φ 4 4   ( Ψ 2 × h   1   2 ( θ 1 2   Φ 1 2   x 0   + θ 2 2   Φ 2 2   x 1 +   θ 3 2   Φ 3 2   x 2 )   ) +   θ 5 4   Φ 5 4   ( Ψ 3 × h   1   3 ( θ 1 3   Φ 1 3   x 0 + θ 2 3   Φ 2 3   x 1 +   θ 3 3   Φ 3 3   x 2 +   θ 4 3   Φ 4 3   ( Ψ 2 × h   1   2 ( θ 1 2   Φ 1 2   x 0   + θ 2 2   Φ 2 2   x 1 +   θ 3 2   Φ 3 2   x 2 ) ) ) ) ) } +   θ 7 6   Φ 7 6   { Ψ 5 × h   1   5 ( θ 1 5   Φ 1 5   x 1   + θ 2 5   Φ 2 5   x 2 +   θ 3 5   Φ 3 5   c 0 +   θ 4 5   Φ 4 5   ( Ψ 2 × h   1   2 ( θ 1 2   Φ 1 2   x 0   + θ 2 2   Φ 2 2   x 1 +   θ 3 2   Φ 3 2   x 2 )   ) +   θ 5 5   Φ 5 5   ( Ψ 3 × h   1   3 ( θ 1 3   Φ 1 3   x 0 + θ 2 3   Φ 2 3   x 1 +   θ 3 3   Φ 3 3   x 2 +   θ 4 3   Φ 4 3   ( Ψ 2 × h   1   2 ( θ 1 2   Φ 1 2   x 0   + θ 2 2   Φ 2 2   x 1 +   θ 3 2   Φ 3 2   x 2 ) ) ) ) +   θ 6 5   Φ 6 5   ( Ψ 4 × h   1   4 ( θ 1 4   Φ 1 4   x 1   + θ 2 4   Φ 2 4   x 2 +   θ 3 4   Φ 3 4   c 0 + θ 4 4   Φ 4 4   ( Ψ 2 × h   1   2 ( θ 1 2   Φ 1 2   x 0   + θ 2 2   Φ 2 2   x 1 +   θ 3 2   Φ 3 2   x 2 )   ) +   θ 5 4   Φ 5 4   ( Ψ 3 × h   1   3 ( θ 1 3   Φ 1 3   x 0 + θ 2 3   Φ 2 3   x 1 +   θ 3 3   Φ 3 3   x 2 +   θ 4 3   Φ 4 3   ( Ψ 2 × h   1   2 ( θ 1 2   Φ 1 2   x 0   + θ 2 2   Φ 2 2   x 1 +   θ 3 2   Φ 3 2   x 2 ) ) ) ) ) ) ) } ]
The parameters Ψ ,   Φ , and θ in (12) are optimized for minimizing the difference between the estimates made by the hypothesis h Ψ Φ θ ( x ) and the actual output of a system, i.e., y , defined by a loss function J ( Ψ ,   Φ , θ ) . The loss function J ( Ψ ,   Φ , θ ) is the mean absolute percentage error (MAPE) as given in (13). We use MAPE because it measures residual errors that give a global idea of the difference between estimates made by hypothesis h Ψ Φ θ ( x ) and the actual output y . In (13), m represents the number of training examples, and s represents an index for each training example that ranges from 1 to m . The loss function J ( Ψ ,   Φ , θ ) is minimized by the evolutionary optimization method given in figure.
J ( Ψ , Φ , θ ) = s = 1 m | y s h Ψ Φ θ ( x s ) y s | m × 100

4. CGPNN Optimization Method

CGPNN follows the basic principles of CGP for optimization. During optimization, each neuron is denoted by its genotype. In Figure 3, the genotype of a generic CGPNN neuron is shown. Inputs range from input 0 to input q , along each input are respective synaptic plasticity parameters Φ and weight θ .
Now, the genotype CGPNN of Figure 2 is placed in an array. Inputs x 0 ,   x 1 ,   x 2 ,   c 0 and d 0 are represented by integers 0, 1, 2, 3 and 4, respectively. The hidden layers neurons c 1 ,   c 2 ,   d 1 and d 2 are represented by integers 5, 6, 7 and 8, respectively. In array first five locations are dedicated to the CGPNN input layer. Each neuron of hidden and output layers is placed in eleven consecutive locations of the array. The layer 2 neuron c 1 is placed at indices 5 to 15 of the array, c 2 at indices 16 to 26, d 1 at indices 27 to 37, d 2 at indices 38 to 48 and o 1 at indices 49 to 59 of the array. The inputs indices of the array for c 1 may have values 0, 1, and 2, for c 2 input values maybe 0, 1, 2 and 5, for d 1 input locations may have values 1, 2, 3, 5, and 6, for d 2 input locations may have values 1, 2, 3, 5, 6, and 7, while for o 1 input locations may have values 1, 2, 4, 5, 6, 7, and 8. All locations of the array having Ψ and Φ will have values 0 or 1. As we are using two types of hypothesis functions, so for a neuron having a logistic sigmoid function, then in genotype its code will be 0, and for linear hypothesis function code will be 1. Thus, in all function’s locations of neurons except for o 1 , the function genetic code is 0. For o 1 function, code maybe 0 for classification problem and 1 for a regression problem. All parameters and bias inputs are initialized, as presented in the algorithm shown in Figure 4. The genes of the initialized array are mutated (e.g., 10% mutation) so as to make λ mutants (e.g., λ = 9, where mutant is offspring generated after mutation operator). The value of loss function J ( Ψ ,   Φ , θ ) for each mutant is calculated according to the definition of (13). The mutant with the lowest J ( Ψ ,   Φ , θ ) value is selected for the next generation. In the next generation, the parent mutant is again mutated to generate λ offspring mutants, again J ( Ψ ,   Φ , θ ) is calculated. The mutant with the lowest J ( Ψ ,   Φ , θ ) is selected for the next generation, and the process continues. Until the lowest limit of J ( Ψ ,   Φ , θ ) is reached or the maximum number of iterations is completed, then the optimization stops. The last mutant is the genotype of the CGPNN that has optimized parameters Ψ ,   Φ , and θ .
Figure 4. A single thread of parallel neuroevolution.

5. Experimental Platform and Methodology

We execute the training and testing experiments on HP Pro Desk 400G3 MT Business PC (HP, Palo Alto, CA, USA) that has Intel® Core™ i7-6700 CPU 3.40 GHz processor and 8 GB RAM (Intel, Santa Clara, CA, USA). We used real CPU traces of Bitbrains (http://gwa.ewi.tudelft.nl/datasets/gwa-t-12-bitbrains) data center during our experiments. The dataset is composed of records of performance metrics of 1250 VMs from a distributed datacenter Bitbrains. Bitbrains is a service provider that is specialized in managed hosting and business computation for enterprises. Customers include many major banks, credit card operators, and insurers. In this study, we use resource usage traces of 120 VMs running on a single cloud server of the (from the fastStorage data set) Bitbrains data center. For each VM, the data set has monthly usage records for CPU, memory, network, and storage. We have selected CPU as a candidate for estimation due to its excessive usage for CPU intensive workloads. We have divided the CPU data set into two halves for each half for training and testing. We have conducted the experiments according to the methodology pictorially shown in Figure 5. In the experiments, we used six different prediction points/instances, five different seeds, and three different initial chromosome size (number of neurons). Before running an experiment, we select prediction points either 1 or 2 or 3 or 4 or 5 or 6. Then we select the seed number (i.e., either 1, 2, 3, 4 or 5). After the selection of prediction points and the seed number, we then select the chromosome size from three possible options of 50, 100 and 500. After these initial settings, we feed the training data to the neuroevolutionary algorithm for training the CGPNN.
Figure 5. Experimental methodology of parallel neuroevolution.
Then we feed the testing data set to the trained CGPNN for testing. We extract the CGPNN model accuracy on test data and its space and time complexities for storing in a buffer. We repeat the above-mentioned process until all possible initial conditions for prediction points, seed numbers, and chromosome sizes are checked/executed. When all possible conditions/options are checked, we then compare all CGPNNs results stored in the buffer. Moreover, select the CGPNN model with the best prediction accuracy and possibly least space and time complexities.

6. Results and Discussion

Table 2 presents the results of the experiments for each prediction instance. It can be seen that the one instance prediction has the least mean absolute error (MAE). Two instances of prediction have the least space complexity. While one, three, and six instances prediction have smaller time complexities than two, four, and five instances prediction models. To avoid scaling errors/overheads, space complexity can be compromised for MAE. Thus, the one instance prediction can be chosen that has the least MAE and time complexity by compromising on space complexity.
Table 2. Summary of prediction results for different numbers of prediction samples.
Table 3 presents the results for a one-point prediction with five different seeds and three different chromosome sizes. In the table, we have shown time complexity in terms of critical path multipliers and logistic sigmoid functions. It can be seen that seeds one and five have the least number of critical path multipliers and logistic sigmoid functions for optimal networks with initial chromosome sizes of hundred and fifty neurons, respectively. In contrast, the optimal network of seed five with an initial chromosome size of fifty neurons has the least MAE (i.e., 0.046356). The space complexity is represented by the number of active neurons in Table 3. In seeds two, three, and five, the networks with chromosome size fifty and hundred and in seed four, the network with chromosome size of five hundred, have the least number of active neurons. Thus, the network with a chromosome size of fifty in seed five can be chosen as the final optimal network due to its least MAE, least number of active neurons, least number of critical path multipliers, and logistic sigmoid functions.
Table 3. Summary of results of optimal networks for one sample/instance prediction.
Figure 6 presents the prediction results of the network of seed five with an initial chromosome size of fifty neurons that shows one-day CPU usage prediction results of the Bitbrains data center server. It is clear from the results that our CGP based neuroevolutionary model predicts CPU usage with an accuracy (MAE) of 0.046356. In the next section, we compare our CGP based neuroevolutionary model with models from literature for accuracy and space and time complexities.
Figure 6. Cloud server CPU usage prediction result of proposed model.

Comparison with Related Work

Table 4 presents the model comparison for testing accuracy (MAE), space, and time complexities. For comparison, we trained all models from the literature on training data of Bitbrains CPU data set. The tabulated results are based on test data.
Table 4. Comparison of cloud server prediction models: proposed vs. related work.
AR-NN has MAE 0.16874602 and constant time Space and time Complexities. ARIMA, MLP, and ANN have constant time Space and time Complexities. MAE of ARIMA is 0.11476377. Whereas MLP and ANN have 0.1172989 and 0.1200977 MAE values, respectively. SVR with linear, sigmoid, radial, and polynomial kernels have Space and time Complexities of order O ( n 2 ) , while their MAE values are 0.1208303, 0.1212155, 0.1358454, and 0.1227533, respectively. ELM, RANN (recurrent artificial neural network ) -Elman, RANN-Jordan, and LR have constant time space and time complexities. While their MAE values are 0.1193865, 0.1133194, 0.1239574, and 0.1190222, respectively. Similarly, KNN has Space and time Complexities of order O ( n × n ) and MAE 0.1978246. While the proposed neuroevolutionary model has constant time space and time complexities and MAE 0.046356. The proposed model has the least MAE that makes it the best predictor of all the models under research.
Results show that all trained models except SVR and KNN have constant-time space and computation time complexities. We have used the notation O ( 1 ) for constant time space and time complexities (there are slight differences among space and time complexities of models). Results show that our neuroevolutionary model has lesser space and time complexities than KNN and SVR models. The model has better prediction accuracy than other neural networks like feed-forward ANNs and recurrent ANNs (RNNs) [41,42]. While the proposed model has the best prediction accuracy (MAE) of all models from the literature. To present the proposed model’s difference in performance/accuracy from other models more clearly, we have plotted the MEs of all models in Figure 7. That shows that the proposed model has the least MAE. In other words, Figure 7 shows that the proposed model has the best prediction performance/accuracy of all models under study.
Figure 7. Cloud server CPU usage prediction performance comparison.

7. Conclusions and Future Directions

With the increasing demand for cloud services, the load on cloud data centers increases in an irregular fashion. For executing the irregular workload patterns, cloud servers use predictive scaling mechanisms to conserve energy in idle/low load times [7]. The energy conservation capability of predictive scaling mechanisms can be enhanced by accurately predicting CPU demand before the scaling of resources.
We introduced a CGP-based parallel neuroevolutionary model and evaluated its accuracy for future CPU usage prediction using real CPU usage traces of Bitbrains data center. We also evaluated the model for space and computation time complexities with six different instances of prediction, five different seeds, and three different initial network/chromosome sizes. Our model has achieved the best prediction accuracy of all models (i.e., AR, ARIMA, KNN, ELM, SVR, and ANNs). Experimental results showed that our model achieved 97% prediction accuracy, which will lead to correct scaling decisions in predictive scaling mechanisms of cloud servers.
In the future, we plan to integrate our CGP-based parallel neuroevolutionary model with a predictive scaling mechanism on a multicore cloud server. In addition, our model can be used with hotplugging mechanisms in hand-held devices for conserving battery charge.

Author Contributions

Conceptualization, Q.Z.U. and G.M.K.; formal analysis, S.H. and F.U.; funding acquisition, K.S.K.; investigation, Q.Z.U.; project administration, K.S.K.; software, A.I.; supervision, G.M.K.; validation, Q.Z.U.; writing—original draft, Q.Z.U.; writing—review and editing, F.U. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a National Research Foundation of Korea Grant funded by the Korean Government (Ministry of Science and ICT)-NRF-2020R1A2B5B02002478.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Brown, R.; Masanet, E.R.; Nordman, B.; Tschudi, W.F.; Shehabi, A.; Stanley, J.; Koomey, J.G.; Sartor, D.A.; Chan, P.T. Report to Congress on Server and Data Center Energy Efficiency: Public Law 109–431; University of California: Berkeley, CA, USA, 2008. [Google Scholar]
  2. Uddin, M.; Rahman, A.A. Server consolidation: An approach to make data centers energy efficient and green. Int. J. Sci. Eng. Res. 2010, 1, 1. [Google Scholar] [CrossRef]
  3. Shang, L.; Peh, L.S.; Jha, N.K. Dynamic Voltage Scaling with Links for Power Optimization of Interconnection Networks. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA’03), Anaheim, CA, USA, 8–12 February 2003; Volume 3, pp. 91–102. [Google Scholar]
  4. Dougherty, B.; White, J.; Schmidt, D.C. Model-driven auto-scaling of green cloud computing infrastructure. Future Gener. Comput. Syst. 2012, 28, 371–378. [Google Scholar] [CrossRef]
  5. Meisner, D.; Gold, B.T.; Wenisch, T.F. Powernap: Eliminating server idle power. ACM Sigplan Not. 2009, 44, 205–216. [Google Scholar] [CrossRef]
  6. Mwaikambo, Z.; Raj, A.; Russell, R.; Schopp, J.; Vaddagiri, S. Linux kernel hotplug cpu support. In Proceedings of the Linux Symposium, Ottawa, ON, Canada, 21–24 July 2004; Volume 2. [Google Scholar]
  7. Ullah, Q.Z.; Khan, G.M.; Hassan, S. Cloud Infrastructure Estimation and Auto-Scaling Using Recurrent Cartesian Genetic Programming-Based ANN. IEEE Access 2020, 8, 17965–17985. [Google Scholar] [CrossRef]
  8. Gandhi, A.; Chen, Y.; Gmach, D.; Arlitt, M.; Marwah, M. Minimizing data center SLA violations and power consumption via hybrid resource provisioning. In Proceedings of the 2011 International Green Computing Conference and Workshops, Orlando, FL, USA, 25–28 July 2011; pp. 1–8. [Google Scholar]
  9. Dabbagh, M.; Hamdaoui, B.; Guizani, M.; Rayes, A. Toward energy-efficient cloud computing: Prediction, consolidation, and overcommitment. IEEE Netw. 2015, 29, 56–61. [Google Scholar] [CrossRef]
  10. Moriarty, D.E.; Mikkulainen, R. Efficient reinforcement learning through symbiotic evolution. Mach. Learn. 1996, 22, 11–32. [Google Scholar] [CrossRef]
  11. Calheiros, R.N.; Masoumi, E.; Ranjan, R.; Buyya, R. Workload prediction using the Arima model and its impact on cloud application’s QoS. IEEE Trans. Cloud Comput. 2015, 3, 449–458. [Google Scholar] [CrossRef]
  12. Islam, S.; Keung, J.; Lee, K.; Liu, A. Empirical prediction models for adaptive resource provisioning in the cloud. Future Gener. Comput. Syst. 2012, 28, 155–162. [Google Scholar] [CrossRef]
  13. Zeileis, A. Dynlm: Dynamic Linear Regression; Cran: Innsbruck, Austria, 2019; Available online: https://cran.r-project.org/web/packages/dynlm/dynlm.pdf (accessed on 17 November 2020).
  14. Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S-PLUS. In Statistics and Computing, 3rd ed.; Springer: New York, NY, USA, 2001. [Google Scholar]
  15. R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2010. [Google Scholar]
  16. Prevost, J.J.; Nagothu, K.; Kelley, B.; Jamshidi, M. Prediction of cloud data center networks loads using stochastic and neural models. In Proceedings of the 6th International Conference on System of Systems Engineering SoSE 2011, Albuquerque, NM, USA, 27–30 June 2011; pp. 276–281. [Google Scholar]
  17. Ismaeel, S.; Miri, A. Multivariate time series elm for cloud data center workload prediction. In Proceedings of the 18th International Conference on Human-Computer Interaction, Toronto, ON, Canada, 17–22 July 2016; pp. 565–576. [Google Scholar]
  18. Farahnakian, F.; Pahikkala, T.; Liljeberg, P.; Plosila, J. Energy-aware consolidation algorithm based on k-nearest neighbor regression for cloud data centers. In Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing (UCC), Dresden, Germany, 9–12 December 2013; pp. 256–259. [Google Scholar]
  19. Nikravesh, A.; Yadavar, A.; Samuel, A.; Lung, C.-H. Towards an autonomic auto-scaling prediction system for cloud resource provisioning. In Proceedings of the 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Florence, Italy, 16–24 May 2015; pp. 35–45. [Google Scholar]
  20. Gong, Z.; Gu, X.; Wilkes, J. Press: Predictive elastic resource scaling for cloud systems. In Proceedings of the 6th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 25–29 October 2010; pp. 9–16. [Google Scholar]
  21. Sudevalayam, S.; Kulkarni, P. Affinity-aware modeling of CPU usage for provisioning virtualized applications. In Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, Washington, DC, USA, 4–9 July 2011; pp. 139–146. [Google Scholar]
  22. Ullah, Z.; Hassan, Q.S.; Khan, G.M. Adaptive Resource Utilization Prediction System for Infrastructure as a Service Cloud. Comput. Intell. Neurosci. 2017, 2017. [Google Scholar] [CrossRef]
  23. Duggan, M.; Mason, K.; Duggan, J.; Howley, E.; Barrett, E. Predicting host CPU utilization in cloud computing using recurrent neural networks. In Proceedings of the 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST), Cambridge, UK, 11–14 December 2017. [Google Scholar]
  24. Rizvandi, N.B.; Taheri, J.; Moraveji, R.; Zomaya, A.Y. On modeling and prediction of total CPU usage for applications in MapReduce environments. In Proceedings of the International Conference on Algorithms and architectures for parallel processing, Fukuoka, Japan, 4–7 September; pp. 414–427.
  25. Dinda, P. Design, implementation, and performance of an extensible toolkit for resource prediction in distributed systems. IEEE Trans. Parallel Distrib. Syst. 2006, 17, 160–173. [Google Scholar] [CrossRef]
  26. Yang, L.; Foster, I.; Schopf, M.J. Homeostatic and tendency-based CPU load predictions. In Proceedings of the IPDPS ’03 17th International Symposium on Parallel and Distributed Processing, Nice, France, 22–26 April 2003; p. 42. [Google Scholar]
  27. Liang, J.; Nahrstedt, K.; Zhou, Y. Adaptive multi-resource prediction in a distributed resource sharing environment. In Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2004), Chicago, IL, USA, 19–22 April 2004; pp. 293–300. [Google Scholar]
  28. Wu, Y.; Hwang, K.; Yuan, Y.; Zheng, W. Adaptive workload prediction of grid performance in confidence windows. IEEE Trans. Parallel Distrib. Syst. 2010, 21, 925–938. [Google Scholar]
  29. Yuan, Y.; Wu, Y.; Yang, G.; Zheng, W. Adaptive hybrid model for long term load prediction in a computational grid. In Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), Lyon, France, 19–22 May 2008; pp. 340–347. [Google Scholar]
  30. Gooijer, J.G.; Hyndman, R.J. 25 years of time series forecasting. Int. J. Forecast. 2006, 22, 443–473. [Google Scholar] [CrossRef]
  31. Khashei, M.; Bijari, M. A new class of hybrid models for time series forecasting. Expert Syst. Appl. 2012, 39, 4344–4357. [Google Scholar] [CrossRef]
  32. Valenzuela, O.; Rojas, I.; Rojas, F.; Pomares, H.; Herrera, L.J.; Guillen, A.; Marquez, L.; Pasadas, M. Hybridization of intelligent techniques and ARIMA models for time series prediction. Fuzzy Sets Syst. 2008, 159, 821–845. [Google Scholar] [CrossRef]
  33. Cao, J.; Fu, J.; Li, M.; Chen, J. CPU load prediction for cloud environment based on a dynamic ensemble model. Softw. Pract. Exp. 2014, 44, 793–804. [Google Scholar] [CrossRef]
  34. Verma, M.; Gangadharan, G.R.; Narendra, N.C.; Vadlamani, R.; Inamdar, V.; Ramachandran, L.; Calheiros, R.N.; Buyya, R. Dynamic resource demand prediction and allocation in multi-tenant service clouds. Concurr. Comput. Pract. Exp. 2016, 28, 4429–4442. [Google Scholar] [CrossRef]
  35. Sood, S.K. Function points-based resource prediction in cloud computing. Concurr. Comput. Pract. Exp. 2016, 28, 2781–2794. [Google Scholar] [CrossRef]
  36. Chen, J.; Li, K.; Rong, H.; Bilal, K.; Li, K.; Philip, S.Y. A periodicity-based parallel time series prediction algorithm in cloud computing environments. Inf. Sci. 2019, 496, 506–537. [Google Scholar] [CrossRef]
  37. Available online: http://www.csd.uwo.ca/courses/CS9840a/Lecture2_knn.pdf (accessed on 17 November 2020).
  38. Bottou, L.; Lin, C.-J. Support vector machine solvers. In Large Scale Kernel Machines; MIT Press: Cambridge, MA, USA, 2007; Volume 3, pp. 301–320. [Google Scholar]
  39. Caron, E.; Desprez, F.; Muresan, A. Forecasting for grid and cloud computing on-demand resources based on pattern matching. In Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), Indianapolis, IN, USA, 30 November–3 December 2010; pp. 456–463. [Google Scholar]
  40. Prodan, R.; Nae, V. Prediction-based real-time resource provisioning for massively multiplayer online games. Future Gener. Comput. Syst. 2009, 25, 785–793. [Google Scholar] [CrossRef]
  41. Jordan, M.I. Serial Order: A Parallel Distributed Processing Approach; Technical report; June 1985–March 1986; University of California: San Diego, CA, USA, 1997. [Google Scholar]
  42. Elman, J.L. Finding structure in time. Cognit. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
  43. Mason, K.; Duggan, M.; Barrett, E.; Duggan, J.; Howley, E. Predicting host CPU utilization in the cloud using evolutionary neural networks. Future Gener. Comput. Syst. 2018, 86, 162–173. [Google Scholar] [CrossRef]
  44. Grigorievskiy, A.; Miche, Y.; Ventelä, A.-M.; Séverin, E.; Lendasse, A. Long-term time series prediction using op-elm. Neural Netw. 2014, 51, 50–56. [Google Scholar] [CrossRef] [PubMed]
  45. Imandoust, S.B.; Bolandraftar, M. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. Int. J. Eng. Res. 2013, 3, 605–610. [Google Scholar]
  46. Xu, D.; Yang, S.; Luo, H. A Fusion Model for CPU Load Prediction in Cloud Computing. JNW 2013, 8, 2506–2511. [Google Scholar] [CrossRef]
  47. Hu, R.; Jiang, J.; Liu, G.; Wang, L. Efficient resources provisioning based on load forecasting in the cloud. Sci. World J. 2014, 2014. [Google Scholar] [CrossRef]
  48. Keerthi, S.S.; Chapelle, O.; DeCoste, D. Building support vector machines with reduced classifier complexity. J. Mach. Learn. Res. 2006, 7, 1493–1515. [Google Scholar]
  49. Chen, L.; Lai, X. Comparison between Arima and ann models used in short-term wind speed forecasting. In Proceedings of the 2011 Asia-Pacific Power and Energy Engineering Conference, Wuhan, China, 25–28 March 2011; pp. 1–4. [Google Scholar]
  50. Lu, H.-J.; An, C.-L.; Zheng, E.-H.; Lu, Y. Dissimilarity based ensemble of extreme learning machine for gene expression data classification. Neurocomputing 2014, 128, 22–30. [Google Scholar] [CrossRef]
  51. Nikravesh, A.Y.; Ajila, S.A.; Lung, C.-H. An autonomic prediction suite for cloud resource provisioning. J. Cloud Comput. 2017, 6, 3. [Google Scholar] [CrossRef]
  52. Premalatha, K.; Natarajan, A.M. Hybrid PSO and ga for global maximization. Int. J. Open Probl. Comput. Math. 2009, 2, 597–608. [Google Scholar]
  53. Beyer, H.-G.; Sendhoff, B. Covariance matrix adaptation revisited—The CMSA evolution strategy—. In Proceedings of the International Conference on Parallel Problem Solving from Nature, Dortmund, Germany, 13–17 September 2008; pp. 123–132. [Google Scholar]
  54. Shaw, R.; Howley, E.; Barrett, E. An energy-efficient anti-correlated virtual machine placement algorithm using resource usage predictions. Simul. Model. Pract. Theory 2019, 93, 322–342. [Google Scholar] [CrossRef]
  55. Miller, J.F.; Thomson, P. Cartesian genetic programming. In Proceedings of the European Conference on Genetic Programming, Edinburgh, UK, 15–16 April 2000. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.