Deep Learning Cluster Structures for Management Decisions: The Digital CEO †

This paper presents a Deep Learning (DL) Cluster Structure for Management Decisions that emulates the way the brain learns and makes choices by combining different learning algorithms. The proposed model is based on the Random Neural Network (RNN) Reinforcement Learning for fast local decisions and Deep Learning for long-term memory. The Deep Learning Cluster Structure has been applied in the Cognitive Packet Network (CPN) for routing decisions based on Quality of Service (QoS) metrics (Delay, Loss and Bandwidth) and Cyber Security keys (User, Packet and Node) which includes a layer of DL management clusters (QoS, Cyber and CEO) that take the final routing decision based on the inputs from the DL QoS clusters and RNN Reinforcement Learning algorithm. The model has been validated under different network sizes and scenarios. The simulation results are promising; the presented DL Cluster management structure as a mechanism to transmit, learn and make packet routing decisions is a step closer to emulate the way the brain transmits information, learns the environment and takes decisions.


Introduction
Our brain takes decisions in a structured way while performing several functions at the same time. Our brain learns about the environment from our five senses; it stores memories to preserve our identity; it makes judgements on different situations; it protects itself against external threats or attacks. Our brain is formed by clusters of neurons [1] specialized in learning from different senses where information is transmitted as positive and negative spikes or impulses. It functions with two types of memories [2]; short-term memory is used for quick decisions and task-related actions whereas long-term memory preserves our identity and security. Another brain duality consists of its two operation modes [3]; consciousness under normal activities and unconsciousness under emergency situations such as being under external attack or routine operations like storing information while sleeping.
This paper presents the association of the most complex biological system; our brain with the most complex artificial system represented in large data networks: The Internet; the information infrastructure of the Big Data and the Web. The link between both of them is the Random Neural Network (RNN). Data networks collect information from users and transmit it to different locations; to perform this activity, they are required to make routing decisions based on different Quality of Service metrics while storing routing tables in memory under the threat of Cyber-attacks. This paper proposes a Deep Learning (DL) Cluster Structure for Management Decisions that emulates the way the brain learns and makes choices and combines different Learning Algorithms. The proposed model combines the Random Neural Network Reinforcement Learning for fast local decisions and DL for long-term memory to remember network identity: QoS metrics (Delay, Loss and Bandwidth) and Cyber keys (User, Packet and Node). In addition, this paper includes a layer of DL management clusters (QoS, Cyber and CEO) that take the final routing decision based on the inputs from the DL QoS clusters and RNN Reinforcement Learning algorithm.
The Deep Learning Cluster Structures has been applied in the Cognitive Packet Network (CPN) for Quality of Service metrics and Cyber Security keys in Management Decisions based on packet routing and flow control. The RNN Reinforcement Learning Algorithm is chosen under normal or conscious operations due to its fast and adaptable routing learning as short memory whereas DL clusters are selected under external cyber-attacks. Deep Learning clusters take routing decisions based on the long-term memory in unconsciousness operation as safe and resilient, although inefficient and inflexible, routing.
A concepts review of Cybersecurity, Deep Learning and Deep Reinforcement Learning with their associated literature research is described in Section 2. The mathematical model of the Deep Learning clusters Structures for management decisions is defined in Section 3. The implementation of the QoS, Cyber and Management Clusters is presented on Section 4. The validation of the proposed model under different QoS and Cyber scenarios in small (nine nodes, one decision layer), medium (16 nodes, two decision layers) and large (25 nodes, three decision layers) is described in Section 5. Final discussion and bibliography are shared in Section 6 and References respectively.

Cybersecurity
The expansion of the connectivity provided by the Ethernet and Internet protocols has enabled new industrial, technological and social applications and services, however, users are increasingly under new cybersecurity threats and risks. Ericsson [4] introduces cybersecurity issues and threats within Power Communications Systems in a smart grid infrastructure where network vulnerabilities and information security domains are analyzed. Ten [5] presented a survey on cybersecurity of critical infrastructure; in addition, they propose a Supervisory Control And Data Acquisition (SCADA) framework based on four procedures: Real-time monitoring, anomaly detection, impact analysis and mitigation strategy. They model an attack tree analysis with an algorithm for cybersecurity evaluation that incorporates password policies and port auditing. Cruz et al. [6] presented a distributed intrusion detection system for SCADA systems that includes different types of security agents tuned for each specific domain: Development of a network, device and process level capabilities, integration of signature and anomaly-based techniques against threats and finally the adoption of a distributed multi-layered design with message queues to transmit predefined events between elements. Wang et al. [7] proposed a framework to facilitate the development of adversary resistant Deep Neural Networks (DNN) by inserting a data transformation module between the sample and the DNN that avoids threat samples with a minimum impact on the classification accuracy. Tuor et al. [8] presented an unsupervised Deep Learning approach to detect anomalous network activity from system logs in real-time where events are extracted as features and the DNN learns users' normal behavior or anomaly as potential malicious behavior. Wu et al. [9] presented a classification of cyber-physical attacks and risks in cyber manufacturing systems with possible mitigation measures such as supervised machine learning for classification and unsupervised machine learning for anomaly detection on physical data. Kim et al. [10] proposed a new cyber defensive computer control system architecture based on the diversification of hardware systems and unidirectional communications assuming that the detection and prevention of cyber-attacks will never be complete.

Deep Learning
Deep Learning is characterized by using a cascade of l-layers of non-linear processing units for feature extraction and transformation; each successive layer uses the output from the previous Sensors 2018, 18, 3327 3 of 22 layer as input. Deep Learning learns multiple layers of representations that correspond to different levels of abstractions; those levels form a hierarchy of concepts where the higher the level, the more abstract concepts are learned. Schmidhuber et al. [11] examined DL in neural networks; the work includes deep supervised learning, unsupervised learning, reinforcement learning and evolutionary computation. It also includes an indirect search for short programs encoding deep and large networks. The success of machine learning algorithms generally depends on data representation. In order to obtain the appropriate objectives for learning good representations, computing representations and the geometrical connections between representation learning, density estimation and manifold learning; Bengio et al. [12] reviewed recent work in the area of unsupervised feature learning and DL, which includes advances in probabilistic models. They proposed a new probabilistic framework to include likelihood based probabilistic models, reconstruction based models such as autoencoder variants and geometrically based manifold learning approaches. Jie et al. [13] proposed a progressive framework to deep optimize neural networks. They combine the stability of linear methods with the ability of learning complex and abstract internal representations of DL methods. They introduce a linear loss layer between the input layer and the first hidden non-linear layer of a traditional deep learning model where the loss objective for optimization is a weighted sum of linear loss of the added new layer and non-linear loss of the last output layer.
The predominant algorithm to train DL uses stochastic gradient descent methods, although they are easy to implement, gradient descent is difficult to tune and parallelize. In order to overcome this issue, Le et al. [14] studied the advantages and disadvantages of off-the-shelf optimization algorithms in the context of simplification and to speed up the process of pre-training the unsupervised feature learning. Deep networks have been successfully applied to unsupervised feature learning for single modalities such as text, images or audio. However, Ngiam, J. et al. [15] proposed an application of deep networks to learn features over multiple modalities to demonstrate that cross-modality feature learning performs better than single modality learning. The deep network is trained with audio only data but tested with video only data and vice versa. Deep Neural Networks (DDNs) provide good results when large labeled training sets are available, however, they perform worse when mapping sequences to sequences. In order to address this issue, Sutskever et al. [16] presented an approach to sequence learning that makes minimal assumptions on the sequence structure. They use a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Bekker et al. [17] proposed an intracluster training strategy for DL with applications to language identification where the language clusters are used to define a cost function to train a neural network. Their method trains a classifier and analyzes the obtained confusion matrix where languages are simultaneously clustered in the columns and the rows of the confusion matrix. The language clusters are then used to define a modified cost function for training a neural network that learns to distinguish between the true language and languages within the same cluster.

Deep Reinforcement Learning
Deep Learning enables Reinforcement Learning to scale decision-making solutions that were previously unmanageable. A new algorithm called Double Deep Q Network (DQN) that generalizes an arbitrary function approximation was proposed by Hasselt et al. [18]. The algorithm includes DNN and reduces overestimations by decomposing the max operation in the target into action selection and action evaluation. Although DQN solves problems with high dimensional observation spaces; it can only manage discrete and low-dimensional action spaces. As presented by Lillicrap et al. [19], DQN depends on finding the action that maximizes the action-value function which in the continuous-valued case requires an iterative optimization process at each step. In order to overcome this issue, they propose an algorithm based on the deterministic policy gradient that can operate over continuous spaces. A framework for Deep Reinforcement Learning (DRL) that asynchronously executes multiple agents in parallel on multiple instances of the environment is proposed by Mnih et al. [20]. This parallelism decorrelates the agent's data into a more stationary process using gradient descent for optimization of deep neural network controllers. A neural network architecture for model-free reinforcement learning where a dual network represents two separate estimators: one for the state value function and the other for the state-dependent action advantage function is presented by Wang et al. [21]. The two streams are combined via a special aggregating layer to produce an estimate of the state action-value function. A benchmark for continuous simple actions, high state and action dimensionality control, tasks with partial observations and tasks with a hierarchical structure is presented by Duan et al. [22]. They divide 31 tasks into basic control, locomotion and partially observable in order to achieve higher hierarchical structure tasks where higher level decisions can reuse lower level skills. Challenges posed by reproducibility, experimental techniques, and reporting procedures of DRL methods is investigated by Henderson et al. [23]. They present the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in Deep RL more reproducible. DRL for resource management problems in systems and networking is applied by Mao et al. [24]. The decision-making tasks where appropriate taken solutions depend on understanding the workload and environment experience.

The Random Neural Network-Reinforcement Learning
The Random Neural Network (RNN) [25][26][27] represents more closely how signals are transmitted in many biological neural networks where they travel as spikes or impulses, rather than as analogue signal levels ( Figure 1). The RNN is a spiking recurrent stochastic model for neural networks. Its main analytical properties are the "product form" and the existence of the unique network steady-state solution. It has been applied in different applications including search for exit routes for evacuees in emergency situations [28,29], pattern-based search for specific objects [30], video compression [31], and image texture learning and generation [32]. the state value function and the other for the state-dependent action advantage function is presented by Wang et al. [21]. The two streams are combined via a special aggregating layer to produce an estimate of the state action-value function. A benchmark for continuous simple actions, high state and action dimensionality control, tasks with partial observations and tasks with a hierarchical structure is presented by Duan et al. [22]. They divide 31 tasks into basic control, locomotion and partially observable in order to achieve higher hierarchical structure tasks where higher level decisions can reuse lower level skills. Challenges posed by reproducibility, experimental techniques, and reporting procedures of DRL methods is investigated by Henderson et al. [23]. They present the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in Deep RL more reproducible. DRL for resource management problems in systems and networking is applied by Mao et al. [24]. The decision-making tasks where appropriate taken solutions depend on understanding the workload and environment experience.

The Random Neural Network-Reinforcement Learning
The Random Neural Network (RNN) [25][26][27] represents more closely how signals are transmitted in many biological neural networks where they travel as spikes or impulses, rather than as analogue signal levels ( Figure 1). The RNN is a spiking recurrent stochastic model for neural networks. Its main analytical properties are the "product form" and the existence of the unique network steady-state solution. It has been applied in different applications including search for exit routes for evacuees in emergency situations [28,29], pattern-based search for specific objects [30], video compression [31], and image texture learning and generation [32].
The RNN is composed of M neurons each of which receives excitatory (positive) and inhibitory (negative) spike signals from external sources which may be sensory sources or neurons ( Figure 1). These spike signals occur following independent Poisson processes of rates λ + (m) for the excitatory spike signal and λ − (m) for the inhibitory spike signal respectively, to cell m Є {1, ..., M}. The RL algorithm is based on the RNN with at least as many nodes as the number of decisions to be taken is generated where neurons are numbered 1, …, j, …, n; therefore for any decision i, there is some neuron i. Decisions in this RL algorithm with the RNN are taken by selecting the decision j for which the corresponding neuron is the most excited, the one with has the largest value of qj. The state qj is the probability that it is excited, these quantities satisfy the following system of non-linear equations:

The Cognitive Packet Network
The CPN was introduced by Gelenbe et al. [33][34][35][36][37]; it has been tested in large-scale networks up to 100 nodes with worst and best case performance scenarios. The CPN assigns routing and flow control capabilities to the packets rather than the nodes ( Figure 2). QoS goals are assigned to Cognitive Packets (CP) within the CPN, which they follow when making routing decisions themselves with minimum dependence on the nodes. Cognitive Packets learn from experience of other CP packets with whom they interchange network information using n Mailboxes (MB) and their own inspection about the network storing network information in their Cognitive Map (CM). The RNN is composed of M neurons each of which receives excitatory (positive) and inhibitory (negative) spike signals from external sources which may be sensory sources or neurons ( Figure 1). These spike signals occur following independent Poisson processes of rates λ + (m) for the excitatory spike signal and λ − (m) for the inhibitory spike signal respectively, to cell m ∈ {1, . . . , M}.
The RL algorithm is based on the RNN with at least as many nodes as the number of decisions to be taken is generated where neurons are numbered 1, . . . , j, . . . , n; therefore for any decision i, there is some neuron i. Decisions in this RL algorithm with the RNN are taken by selecting the decision j for which the corresponding neuron is the most excited, the one with has the largest value of q j . The state q j is the probability that it is excited, these quantities satisfy the system of non-linear equations:

The Cognitive Packet Network
The CPN was introduced by Gelenbe et al. [33][34][35][36][37]; it has been tested in large-scale networks up to 100 nodes with worst and best case performance scenarios. The CPN assigns routing and flow control capabilities to the packets rather than the nodes ( Figure 2). QoS goals are assigned to Cognitive Packets Given some Goal G that the agent has to achieve as a function to be to be optimized and reward R as a consequence of the interaction with the environment; successive measured values of the R are denoted by Rl, l = 1, 2, … these are used to compute a decision threshold: where α is some constant 0 < α < 1. The agent takes the lth decision which corresponds to neuron j and then the lth reward Rl is measured and its associated Tl−1 is calculated.

Deep Learning Clusters
Deep Learning Clusters with RNN is described by Gelenbe, E. and Yin, Y. [38,39]. This model is based on the generalized queuing networks with triggered customer movement (G-networks) where customers are either "positive" or "negative" and customers can be moved from queues or leave the network ( Figure 3). G-Networks are introduced by Gelenbe et al. [40,41]; an extension to this model is developed by Gelenbe et al. [42] where synchronized interactions of two queues could add a customer in a third queue. The model considers a special network M(n) that contains n identically connected neurons, each which has a firing rate r and external inhibitory and excitatory signals λ − and λ + respectively. The state of each cell is denoted by q, and it receives an inhibitory input from the state of some cell u which does not belong to M(n), therefore for any cell i Є M(n) there is an inhibitory weight w − (u) ≡ w − (u,i) > 0 from u to i. The DL Architecture is composed of C multiple clusters, each of which is made up of an M(n) cluster each with n hidden neurons ( Figure 4). For the c-th such cluster, c = 1, ..., C, the state of each of its identical cells is denoted by qc. In addition, there are U input cells which do not belong to these C clusters, and the state of the u-th cell u = 1, ..., U is denoted by q u ̅̅̅. The cluster network has U input cells and C clusters. The Deep Learning clusters model defines:  Given some Goal G that the agent has to achieve as a function to be to be optimized and reward R as a consequence of the interaction with the environment; successive measured values of the R are denoted by R l , l = 1, 2, . . . these are used to compute a decision threshold: where α is some constant 0 < α < 1. The agent takes the lth decision which corresponds to neuron j and then the l th reward R l is measured and its associated T l−1 is calculated.

Deep Learning Clusters
Deep Learning Clusters with RNN is described by Gelenbe, E. and Yin, Y. [38,39]. This model is based on the generalized queuing networks with triggered customer movement (G-networks) where customers are either "positive" or "negative" and customers can be moved from queues or leave the network ( Figure 3). G-Networks are introduced by Gelenbe et al. [40,41]; an extension to this model is developed by Gelenbe et al. [42] where synchronized interactions of two queues could add a customer in a third queue. The model considers a special network M(n) that contains n identically connected neurons, each which has a firing rate r and external inhibitory and excitatory signals λ − and λ + respectively. The state of each cell is denoted by q, and it receives an inhibitory input from the state of some cell u which does not belong to M(n), therefore for any cell i ∈ M(n) there is an inhibitory Given some Goal G that the agent has to achieve as a function to be to be optimized and reward R as a consequence of the interaction with the environment; successive measured values of the R are denoted by Rl, l = 1, 2, … these are used to compute a decision threshold: where α is some constant 0 < α < 1. The agent takes the lth decision which corresponds to neuron j and then the lth reward Rl is measured and its associated Tl−1 is calculated.

Deep Learning Clusters
Deep Learning Clusters with RNN is described by Gelenbe, E. and Yin, Y. [38,39]. This model is based on the generalized queuing networks with triggered customer movement (G-networks) where customers are either "positive" or "negative" and customers can be moved from queues or leave the network ( Figure 3). G-Networks are introduced by Gelenbe et al. [40,41]; an extension to this model is developed by Gelenbe et al. [42] where synchronized interactions of two queues could add a customer in a third queue. The model considers a special network M(n) that contains n identically connected neurons, each which has a firing rate r and external inhibitory and excitatory signals λ − and λ + respectively. The state of each cell is denoted by q, and it receives an inhibitory input from the state of some cell u which does not belong to M(n), therefore for any cell i Є M(n) there is an inhibitory weight w − (u) ≡ w − (u,i) > 0 from u to i. The DL Architecture is composed of C multiple clusters, each of which is made up of an M(n) cluster each with n hidden neurons ( Figure 4). For the c-th such cluster, c = 1, ..., C, the state of each of its identical cells is denoted by qc. In addition, there are U input cells which do not belong to these C clusters, and the state of the u-th cell u = 1, ..., U is denoted by q u ̅̅̅. The cluster network has U input cells and C clusters. The Deep Learning clusters model defines: The DL Architecture is composed of C multiple clusters, each of which is made up of an M(n) cluster each with n hidden neurons ( Figure 4). For the c-th such cluster, c = 1, . . . , C, the state of each of its identical cells is denoted by q c . In addition, there are U input cells which do not belong to these C clusters, and the state of the u-th cell u = 1, . . . , U is denoted by q u . The cluster network has U input cells and C clusters. The Deep Learning clusters model defines: , U × C matrix of weights from the U input cells to the cells in each of the C clusters; • Y = (y dl1 , y dl2 , . . . , y dlc ), a C-dimensional vector Y ∈ [0,1] C for the cell state q c for the cluster c.
Sensors 2018, 18, 3327 6 of 21  The network learns the U × C weight matrix w − (u,c) by calculating new values of the network parameters for the input I and output Y using Gradient Descent learning algorithm which optimizes the network weight parameters w − (u,c) from a set of input-output pairs (iu,yc).

Deep Learning Management Clusters
The Deep Learning management cluster was proposed by Serrano et al. [43]. It takes management decisions based on the inputs from different Deep Learning clusters (   The network learns the U × C weight matrix w − (u,c) by calculating new values of the network parameters for the input I and output Y using Gradient Descent learning algorithm which optimizes the network weight parameters w − (u,c) from a set of input-output pairs (i u ,y c ).

Deep Learning Management Clusters
The Deep Learning management cluster was proposed by Serrano et al. [43]. It takes management decisions based on the inputs from different Deep Learning clusters ( Management Cluster defines: • Imc = ( imc1 , i mc2 , …, i mcc ), C-dimensional vector Imc Є [0,1] C for the input state q c ̅ for the cluster c; • w − (c), C-dimensional vector of weights from the C input clusters to the cells in the Management Cluster mc; • Ymc, a scalar Ymc Є [0,1], the cell state qmc for the Management Cluster mc.   The RNN RL Algorithm is chosen by the CEO DL Management Cluster under normal or conscious operations due to its fast and adaptable routing learning as short memory whereas DL clusters are selected under external cyber-attacks based on the long-term memory in unconsciousness operation as a safe and resilient although inefficient and inflexible routing.
The RNN RL Algorithm instantaneously updates its network weights based on the direct observations from the network; this enables its routing algorithm to take quick decisions adaptable to changes. Deep Learning algorithm adapts slowly to network changes where the proposed model applies it as a reliable and safe routing when the CPN is compromised by a Cyber-attack; it emulates the brain in a subconscious mode with long-term memory; where it takes minimum decisions for defense or survival.

Deep Learning Clusters
DL clusters (Appendix A) learn the network identity that consists of QoS network metrics, including best routes for each QoS metric, and Cyber keys. A DL cluster is assigned to each QoS metric: Delay, Packet Loss and Bandwidth. Each QoS DL cluster learns the best-associated QoS metric with its best-associated node gates. When a node observes a better QoS route with a lower QoS metric; it learns its value and includes the gate on the first position of the QoS DL routing table.
In addition, a DL cluster is assigned per Cyber key: User, Packet and Node. The user cyber network weights authenticate the application that has transmitted the packet. The packet cyber network weights validate the packet transmitted is legitimate; this secures the network against Denial of Service attacks. The node cyber network weights authenticate the nodes within the CPN; this secures the CPN against impostor nodes. The Cyber network weights could have been assigned previously to the CPN nodes by the network administrator or the CPN nodes could have learnt them in an initialization mode. When a CPN node receives a CP; each Cyber DL cluster extracts its The RNN RL Algorithm is chosen by the CEO DL Management Cluster under normal or conscious operations due to its fast and adaptable routing learning as short memory whereas DL clusters are selected under external cyber-attacks based on the long-term memory in unconsciousness operation as a safe and resilient although inefficient and inflexible routing.
The RNN RL Algorithm instantaneously updates its network weights based on the direct observations from the network; this enables its routing algorithm to take quick decisions adaptable to changes. Deep Learning algorithm adapts slowly to network changes where the proposed model applies it as a reliable and safe routing when the CPN is compromised by a Cyber-attack; it emulates Sensors 2018, 18, 3327 8 of 22 the brain in a subconscious mode with long-term memory; where it takes minimum decisions for defense or survival.

Deep Learning Clusters
DL clusters (Appendix A) learn the network identity that consists of QoS network metrics, including best routes for each QoS metric, and Cyber keys. A DL cluster is assigned to each QoS metric: Delay, Packet Loss and Bandwidth. Each QoS DL cluster learns the best-associated QoS metric with its best-associated node gates. When a node observes a better QoS route with a lower QoS metric; it learns its value and includes the gate on the first position of the QoS DL routing table.
In addition, a DL cluster is assigned per Cyber key: User, Packet and Node. The user cyber network weights authenticate the application that has transmitted the packet. The packet cyber network weights validate the packet transmitted is legitimate; this secures the network against Denial of Service attacks. The node cyber network weights authenticate the nodes within the CPN; this secures the CPN against impostor nodes. The Cyber network weights could have been assigned previously to the CPN nodes by the network administrator or the CPN nodes could have learnt them in an initialization mode. When a CPN node receives a CP; each Cyber DL cluster extracts its relevant keys and uses them as input and output values. If the quadratic error between the Cyber DL cluster output vector and the input vector is over a threshold then the CPN node considers the certificate as invalid or the CPN is under Cyber-attack.
This model defines three QoS clusters; Delay, Packet Loss and Bandwidth:

Deep Learning Management Cluster
The DL management clusters take the overall routing management decision (Figure 7). The QoS and Cyber management clusters analyze the output from their associated QoS and Cyber DL clusters respectively. If the Cyber management cluster detects a failure in the cyber certificates; the CEO management cluster routes the network Cognitive Packets as safe mode using the QoS DL clusters, otherwise, if the Cyber certificates are valid the CEO management cluster chooses the route provided by the RNN-RL routing algorithm as normal mode.

Implementation
The Deep Learning Clusters Structure for Management Decisions is implemented in the CPN using the Network Simulator Omnet 5.0. The simulation covers several size nxn square CPNs where all the nodes in the same and adjacent layers are connected with each other. For simplicity, the simulation always considers the first node (Node 1) as the only transmitter and the last node (Node n) as the only receiver; the other nodes only participate in the routing of Cognitive Packets. An example of a 4 × 4 network is shown in Figure 8. Each node has normalized QoS Delay, Loss and Bandwidth metrics as relative to their number; in an nxn network node i will have Delay: 10i; Loss: 5(n − i) and Bandwidth: 5 + 10i respectively. The approach is represented in Table 1 for a 4 × 4 network. After two Cognitive Packets are sent with a defined QoS; the QoS metric swaps between each internal node the within the same column for a 4 × 4 CPN. This model proposes to set the RNN-RL network weights with initialization packets sent at random gates.  • Y CEOmc , a scalar Y CEOmc ∈ [0,1] that represents the final routing decision.

Implementation
The Deep Learning Clusters Structure for Management Decisions is implemented in the CPN using the Network Simulator Omnet 5.0. The simulation covers several size nxn square CPNs where all the nodes in the same and adjacent layers are connected with each other. For simplicity, the simulation always considers the first node (Node 1) as the only transmitter and the last node (Node n) as the only receiver; the other nodes only participate in the routing of Cognitive Packets. An example of a 4 × 4 network is shown in Figure 8.
The Deep Learning Clusters Structure for Management Decisions is implemented in the CPN using the Network Simulator Omnet 5.0. The simulation covers several size nxn square CPNs where all the nodes in the same and adjacent layers are connected with each other. For simplicity, the simulation always considers the first node (Node 1) as the only transmitter and the last node (Node n) as the only receiver; the other nodes only participate in the routing of Cognitive Packets. An example of a 4 × 4 network is shown in Figure 8. Each node has normalized QoS Delay, Loss and Bandwidth metrics as relative to their number; in an nxn network node i will have Delay: 10i; Loss: 5(n − i) and Bandwidth: 5 + 10i respectively. The approach is represented in Table 1 for a 4 × 4 network. After two Cognitive Packets are sent with a defined QoS; the QoS metric swaps between each internal node the within the same column for a 4 × 4 CPN. This model proposes to set the RNN-RL network weights with initialization packets sent at random gates. Each node has normalized QoS Delay, Loss and Bandwidth metrics as relative to their number; in an n × n network node i will have Delay: 10i; Loss: 5(n − i) and Bandwidth: 5 + 10i respectively. The approach is represented in Table 1 for a 4 × 4 network. After two Cognitive Packets are sent with a defined QoS; the QoS metric swaps between each internal node the within the same column for a 4 × 4 CPN. This model proposes to set the RNN-RL network weights with initialization packets sent at random gates.

Cyber Deep Learning Cluster
The Cyber DL clusters have ten input cells (u = 10) and ten output clusters (c = 10). The key is a vector of 10 dimensions. i Cyber-u u , i Cyber-p u , i Cyber-n u have a value between 0.1 and 0.9 with increments 0.1∆. The Cyber DL clusters network weights are trained with the value of the input the same as the output.

Deep Learning Management Cluster
The inputs of the Cyber management cluster are the errors provided by each Cyber DL cluster and the value of its network weights are set with the same value (0.1) therefore different cyber DL clusters have the same priority. The output Y cmc is the overall Cyber quantified error decision based on a threshold. The input of the QoS management cluster are the best QoS metrics from each QoS DL cluster and the value of its networks weights corresponds to the Goal = (αDelay, βLoss, γBandwidth). The output Y qmc is quantified best QoS metric decision.
The input of the CEO management cluster is the value provided by the QoS management cluster and its network weight is the value provided by the Cyber management cluster. The output is the final routing decision between the different gates provided by the RNN-RL algorithm, Delay, Loss and Bandwidth DL clusters.

Experimental Results
The DL Clusters Structure for Management Decisions has been simulated in three different n × n Cognitive Packet Network sizes, 3 × 3, 4 × 4, and 5 × 5 with different Cyber keys; QoS metrics and Goal changes to assess the routing decision-making of our proposed DL Structure. Please note that we are not evaluating the routing protocol but the routing decision.

Cyber Deep Learning cluster results
The different Cyber DL clusters are validated where the security keys are modified at node 1 and the cyber validation error is measured at the next node 4 once the CPs have a stable route. The keys are gradually changed; from the correct key to 0.1∆ increments applied to the different key dimensions.
The Cyber DL cluster error largely increases even only with one 0.1∆ increment ( Table 2). The results are consistent between the different Cyber DL clusters. Cyber key increments have a bigger error if they are applied in the same dimension rather than split into different dimensions.

RNN-RL
The route provided by the QoS DL clusters remains unchanged due to its slow learning process until the new best route is found by the RNN-RL. The Reward and Threshold of route decision taken by the CEO Management Cluster when the Cyber management cluster has authorized the different Cyber keys is shown in Figure 9 for the seven different Goals. When the new best route is discovered; the CPN Threshold adapts gradually to the original value.  Table 9. Table 9. DL Management Cluster Validation (3 × 3 Nodes).

Variable
Cognitive  The 4 × 4 CPN is simulated with a continuous 380 Cognitive Packet stream. The first 100 packets are used to initialize the CPN network. Goal changes after 40 packets whereas QoS metric changes 2 packets after the new Goal is selected following T l = 0.99T l−1 + 0.01R where T is the Threshold at decision packet l and R is the Reward. The QoS DL clusters have been validated with seven different variable Goals for the same Cognitive Packet stream (Table 10).  The average error and learning algorithm iteration values for the QoS and Cyber DL clusters is shown in Table 11. The number of updates in the network weights or routing table for the DL cluster and the RNN Reinforcement Learning is shown in Table 12. The number of iterations to update RNN-RL is only one whereas DL clusters require approximately 150 iterations as shown in Table 11. The route decision taken by the CEO Management Cluster when the Cyber management cluster has authorized the different Cyber keys are shown in Table 13, for the first Goal only.
The results provided by the 4 × 4 CPN are similar to the 3 × 3 CPN. The first two packets follow the best route whereas the third packet acknowledges the QoS metrics have changed. RNN-RL finds the optimum route after Cognitive Packets explore the network and DL learns the route a Cognitive Packet after. The Reward and Threshold of route decision taken by the CEO Management Cluster when the Cyber management cluster has authorized the different Cyber keys is shown in Figure 10 for the seven different Goals. When the new best route is discovered; the CPN Threshold adapts gradually to the original value. The results provided by the 4 × 4 CPN are similar to the 3 × 3 CPN. The first two packets follow the best route whereas the third packet acknowledges the QoS metrics have changed. RNN-RL finds the optimum route after Cognitive Packets explore the network and DL learns the route a Cognitive Packet after. The Reward and Threshold of route decision taken by the CEO Management Cluster when the Cyber management cluster has authorized the different Cyber keys is shown in Figure 10 for the seven different Goals. When the new best route is discovered; the CPN Threshold adapts gradually to the original value.

Deep Learning Management Cluster Results (4 × 4 Nodes)
The results provided by the DL management cluster confirm the proposed model. The correct quantification of the DL management cluster cell states and the selection of the accurate thresholds are fundamental to take relevant optimum decisions. Three different strategic Cognitive Packets are chosen (CP 107, CP 228 and CP 341) for the 4 × 4 CPN validation, where each one has a different Goal. Results for the two different Cyber Security scenarios; ∆ = 0: normal operation and ∆ = 0.1: CPN under Cyber-attack are shown in Table 14.   The 5 × 5 CPN is simulated with a continuous 1550 Cognitive Packet stream. The first 1500 packets are used to initialize the CPN network with a single 1.0 × Delay Goal after 50 packets whereas QoS metric changes 2 packets after the Goal is selected following T l = 0.999T l−1 + 0.01R. The QoS DL clusters have been validated with only one Goal for the same Cognitive Packet stream (Table 15). The average error and learning algorithm iteration values for the QoS and Cyber Deep Learning clusters is shown in Table 16. The number of updates in the network weights, or Routing Table for the DL cluster and the RNN Reinforcement Learning is represented in Table 17. The Network keeps sending Cognitive Packets until the value of the 1/Reward is lesser than the 1/Threshold. When the new best route is discovered as shown in Figure 11; the CPN Threshold adapts gradually to the original value.

Deep Learning Management Cluster Results (5 × 5 Nodes)
Results for the two different Cyber Security scenarios; ∆ = 0: normal operation and ∆ = 0.1: CPN under Cyber-attack are shown in Table 18. For the 5 × 5 CPN, the results of the DL Management cluster are consistent with the previous results, the DL management cluster adapts to network changes and provides the optimum route based on the current network conditions.

Deep Learning Management Cluster Results (5 × 5 Nodes)
Results for the two different Cyber Security scenarios; ∆ = 0: normal operation and ∆ = 0.1: CPN under Cyber-attack are shown in Table 18. For the 5 × 5 CPN, the results of the DL Management cluster are consistent with the previous results, the DL management cluster adapts to network changes and provides the optimum route based on the current network conditions.

Conclusions
This paper has presented a Deep Learning Cluster Structure for Management Decisions. The proposed hierarchical decision model has been validated in the Cognitive Packet Network with three configurations: small size 3 × 3, medium size 4 × 4 and large size 5 × 5 with one, two and three layers of decision respectively. The addition of Deep Learning clusters specialized in different functions (Cyber, QoS, and Management) provides a flexible approach similar to how our brain performs; Deep Learning clusters are able to adapt and being assigned where more routing, computing and memory resources are required.
The RNN Reinforcement Learning algorithm adapts very quickly to variable QoS changes with fast decisions in short-term memory; whereas Deep Learning is slow to adapt to QoS changes as it learns from the RNN-DL algorithm and stores routing information in long-term memory. The CEO management cluster takes the right routing decisions based on the inputs from the QoS and Cyber Management Clusters. This allows the CPN to use a safe route in case of Cyber-attack, or a fast route under normal conditions. Future work will expand the validation gradually up to very large-scale networks (100 nodes, 8 decision layers).

Funding:
The author declares no external funding was provided.

Conflicts of Interest:
The author declares no conflicts of interest.
Appendix A management cluster takes the right routing decisions based on the inputs from the QoS and Cyber Management Clusters. This allows the CPN to use a safe route in case of Cyber-attack, or a fast route under normal conditions. Future work will expand the validation gradually up to very large-scale networks (100 nodes, 8 decision layers).

Funding:
The author declares no external funding was provided.

Conflicts of Interest:
The author declares no conflicts of interest.