Design of Distributed Discrete-Event Simulation Systems Using Deep Belief Networks

: In this research study, we investigate the ability of deep learning neural networks to provide a mapping between features of a parallel distributed discrete-event simulation (PDDES) system (software and hardware) to a time synchronization scheme to optimize speedup performance. We use deep belief networks (DBNs). DBNs, which due to their multiple layers with feature detectors at the lower layers and a supervised scheme at the higher layers, can provide nonlinear mappings. The mapping mechanism works by considering simulation constructs, hardware, and software intricacies such as simulation objects, concurrency, iterations, routines, and messaging rates with a particular importance level based on a cognitive approach. The result of the mapping is a synchronization scheme such as breathing time buckets, breathing time warp, and time warp to optimize speedup. The simulation-optimization technique outlined in this research study is unique. This new methodology could be realized within the current parallel and distributed simulation modeling systems to enhance performance.

The simulations of systems with LPs have an architecture based on sets of LPs. The design of this arrangement of LPs to execute events synchronously or asynchronously in parallel has to have a communication system not only to exchange data but also to synchronize activities. Each LP is assigned to a specific region of the model to be simulated. The simulation engines can operate in an event-driven fashion and execute local events and the respective subset of state variables (and generate remote events-i.e., events in other LPs).
PDDES systems use synchronization techniques that fall "into two main categories: conservative approaches that avoid violating the constraint of local causality, and optimistic approaches that allow violations to occur but provide a mechanism" for recovery called rollback [14]. Rollback involves undoing incorrect modifications. The most effective implementation of the PDDES approach is the optimistic algorithm [15]. It is widely used for simulations in logistics, missile defense, and computational physics [16]. In this paper, we investigate the ability of deep learning neural networks to provide a mapping between features of a PDDES (software and hardware) to an optimistic time synchronization scheme to optimize speedup performance. We will explain the different synchronization techniques that this research implements below.

Conservative and Optimistic Schemes
Simulation objects must interact in a particular fashion to accomplish an efficient parallel and distributed execution with perfect integrity. Several innovative techniques have been developed to solve this challenging problem from conservative and optimistic viewpoints [2].

Conservative Viewpoint
The conservative viewpoint executes events for simulation objects (SOs) once it can be assured that an SO will get no other event with an earlier timestamp. Conservative approaches restrict how SOs may interact. SOs can only interact with other SOs as specified by connectivity rules established during the simulation's initialization.
The most general approach in the conservative domain is fixed time buckets [2]. Fixed time buckets (Figure 1) permit events to be scheduled and executed asynchronously by allowing an SO to schedule events in other simulation objects. This process only occurs tighter in time than the global lookahead (L) of the simulation. For instance, if an SO is at time TA, then the speediest it can book an event for another SO is at TA + LA (Figure 1), where LA is the respective lookahead of the simulation.

Optimistic Viewpoint
The optimistic viewpoint uses a unique approach for attaining parallelism by determinedly executing events but sometimes without considering causal accuracy. Rollback is employed to invalidate events that might have been executed when straggler event messages are accepted from other elements of the simulation system. Therefore, events are executed optimistically without the anticipation of rollback. This optimistic viewpoint has no limitations on how SOs intermingle; however, the disadvantage is that simulations must be built in a rollbackable style.

Time Warp (TW)
The TW event management delivers a well-organized rollback procedure for each simulation object (SO). Each SO has a simulation clock that advances with the timestamp of its executed events. When a SO receives a straggler event, it rolls the SO back. The SO is rolled back until its last executed event before executing more events. If an event was rolled back, it needs to be reprocessed to continue the simulation.
TW only rollbacks the affected events when the SO receives a straggler message. The control structure must retract the events that were scheduled by rollbacked events. Each event must maintain a record of its created events until the event is consigned. Antimessages is the name given to messages used to withdraw wrongly scheduled event messages [2].
A fundamental concept in optimistic time management is the global virtual time (GVT). GVT approves when an event can be committed. Events with timestamps less than GVT are considered appropriately processed and will not be rolled back. The objective is to revise GVT across the simulation as frequently as possible without affecting the efficacy of the simulation due to extreme levels of synchronization. The best performance is on authentic parallel machines with shared memory and high-speed connections ( Figure 2). When a SO gets a straggler, message triggers rollbacks. TW rolls back each null event and deals with the straggler event. When an event is rolled back, this can cause antimessages to be generated for other events, which leads to more rollbacks and antimessages.
Breathing Time Buckets (BTB) TW and fixed time buckets contribute to BTB [2]. The messages created while executing events are not sent pending, acknowledging that the event creating the messages will not be included in the rollback process. BTB is a mix, as explained below:  BTB is TW without the scheme of using antimessages.  BTB deals with events in the same style as fixed time buckets. The difference is that the size of the cycles is not predetermined.
The concept of the event horizon is essential in BTB [17][18][19][20]. The event horizon is defined as the specific time where events created turn back. All new events created at the last bucket are organized and combined into the event queue at the event horizon ( Figure 3). This process is fundamental to exploit. The calculation of the global event horizon is essential to avoid problems with other SOs. The nodes are prepared to synchronize when they have executed events up to their local event horizon. Next, we can calculate GVT as the minimum local event horizon from all the nodes and commit events with timestamps less than or equal to GVT. A probable difficulty is that a number of nodes may have executed events that went further than GVT. Rollback, in this particular case, comprises removing messages that were produced but not sent by the specific event and subsequently returning the SO to the state before the event modified it.

Breathing Time Warp (BTW)
Breathing time warp is another optimistic hybrid scheme [17][18][19][20]. BTW attempts to fix the drawbacks with BTB and TW. TW has the possible problem of antimessage explosions and the corresponding increase in rollbacks. BTB has the possibility of a higher occurrence of synchronizations and reduced parallelism.
When events are close to the current GVT, cascading antimessage explosion can occur. The cause of this explosion is that events being executed far ahead in the simulation time of the rest will probably be rolled back. A potential solution is for those runaway events not to send their messages right away. Furthermore, using TW as the first step and then using BTB later reduces the occurrence of synchronizations and widens the bucket. The cycle is described below in five nodes ( Figure 4): 1. TW phase: This phase starts with TW. There is a crucial flow parameter to fine-tune called Nrisk.
"Nrisk is the number of events processed beyond GVT by each node" (locally) "that are allowed to send their messages with risk" [21]. 2. BTB phase: At the end of the TW phase, messages are held back, and the BTB phase starts execution. 3. Computing GVT: At the end of the BTB phase, computing GVT is performed. There are two other crucial flow parameters to fine-tune called Ngvt and Nopt. "Ngvt is the number of messages received by each node before requesting a GVT update" [21]. On the other hand, "Nopt is the number of events allowed to be processed on each node beyond GVT" [21]. Therefore, Ngvt and Nopt control when GVT is calculated. 4. Committed Events: The events that are executed before GVT is committed.

Problem Statement
Discrete-event simulation on parallel and distributed processors is very different from the single processor scheme, as realized in the traditional and commercial programs. As explained above, techniques such as BTB, BTW, and TW have been developed to implement optimistic time synchronization schemes, each with its respective strengths and weaknesses in PDDES [2]. However, there is no mechanism or efficient rules to decide a priori the best approach at a given simulation problem with the respective hardware, software, and network infrastructure in order to optimize a desired performance measure. Therefore, we introduce deep belief networks (DBNs) as a mechanism to decide a priori the best approach in Section 2. Section 3 describes the validation and variations of the DBN implementation built for this research. Section 4 illustrates the selection of the PDDES environment (WarpIV). Section 5 introduced programming in WarpIV using a case study and speedup (with its relationship to wall-clock time for these preliminary studies-other performance measures are possible, but this study just places emphasis on wall-clock time). Section 6 introduces the measure of complexity utilized to characterize a simulation computer program. The results of the DBN to map a PDDES environment to a synchronization scheme is explained in Section 7. Finally, we provide conclusions and further research in Section 8.

Deep Belief Networks
Hinton and Salakhutdinov [22] began deep learning in 2006 and contributed to a new movement in neural networks. Deep learning is self-learning by constructing a model with several layers and training it with data. This nature of multiple layers can improve the accuracy of the classification. These multiple levels of representation can provide complex mappings [23,24]. This paper studies the capabilities of deep belief network (DBN) for mapping the characteristics of the PDDES to an optimistic synchronization scheme in PDDES.
A deep belief network (DBN) is a machine learning assembly (deep) arranged of a stack of many restricted Boltzmann machines (RBMs) [25,26]. The visible layer of the DBN is the first visible layer of an RBM, while all other layers are hidden DBN layers. The hidden neurons are not connected between them; therefore, they are conditionally independent. To train a DBN, you must train a single RBM at a time. The "input layer is used to train the connection weights between the two layers", while the output layer is used to build the input of the next RBM [24]. The hidden layers of a DBN are unsupervised and act as feature detectors. These unsupervised layers can be useful by detecting features in the PDDES software and hardware and then with the supervised layer, creating the relationships between features and the synchronization schemes. DBNs have successfully created mappings in challenging problems such as traffic flow prediction, electroencephalography, and natural language understanding [26][27][28][29][30]. The work presented in this paper is the first attempt to use DBNs to help design PDDES.
The learning mechanism in DBNs starts with the RBMs and their respective energy function. An energy function based on the connection weights and individual unit biases is used to define the probability distribution over the joint states of the neurons. For binary RBMs, the energy of the joint configuration of visible and hidden neurons is provided by: where θ = ( , , ) and = ( ) with = (ℎ ) are the visible and hidden neurons. Variables and are the bias terms while is the weight between neurons i and j [31,32]. The following equation calculates the probability assigned to every possible pair of a visible vector v: This vector is the partial derivative of the log-likelihood probability of a training vector for the neuron's weights Therefore, the learning rule (i.e., updating of the weights) for stochastic steepest ascent in the log probability of the training dataset is given by: where ε is the learning rate. The individual activation probabilities are defined by: where σ(λ) = 1/(1 + e^(−λ)) is a sigmoid function [22,24]. Correspondingly, for training input v randomly selected, the binary state hj of each hidden neuron j is set to 1 with a probability provided by: Real-valued data are more naturally modeled by using a Gaussian-Bernoulli RBM (GRBM) with an energy function of the form: RBMs represent probability distributions after being trained. They assign a probability to every possible input-data vector using the energy function.
Real-valued GRBMs have a conditional probability for ℎ = 1, a hidden variable turned on, given the evidence vector of the form: The GRBM conditional probability for = 1, given the evidence vector h, is continuous-normal and has the form where (μ , 1) = ( ) √ is a Gaussian distribution with the mean calculated by = ∑ ℎ + and a variance unity [22,24].

Validation and Variations of the Implementation of Deep Belief Networks (DBNs)
The validation of the DBN software was built using MATLAB and was performed using standard benchmark pattern classification data from the MNIST handwritten digits database [33,34]. The MNIST database of handwritten digits has a training set of 60,000 examples and a test set of 10,000 examples. The digits were normalized in size and centered on a 28 × 28-pixel size image ( Figure  5). The MNIST handwritten digits database has become a good database for researchers who want to test software that implements artificial intelligence algorithms on real-world data while spending minimal effort on pre-processing and formatting. Several architectures were developed using a different number of layers and neurons. The performance in the testing set achieved was more significant than 98% accuracy, which corresponds with the values reported by other researchers [35]. The software was also validated by sharing the source code and the results with the MNIST database with the research group of the creator of DBNs (Geoffrey Hinton) at the University of Toronto. Additionally, the DBN software developed was modified to perform signal processing using NASA Space Shuttle data, a very well-known anomaly detection problem [36]. This DBN developed uses stochastic Bernoulli restricted Boltzmann machines in the implementation of DBN. Furthermore, this implementation employs analysis of variances (ANOVA) as "brute force" optimization to help identify DBN parameters, factors that influence the cross-entropy (CE), or the root mean square (RMS) minimum errors during stochastic DBN training.
The method devised for anomaly detection examines the difference between two DBN output probabilities. The output probability of a DBN in reaction to its own nominally trained telemetry signal set is compared with the output probability of the same DBN in reaction to its own nominally trained data set, but with a small change. This method allows for the use of a DBN to detect slight changes in telemetry signals. Figure 6 shows the detection process. This case trained a deep belief neural network with six nominal temperature instrumentation signals, 2 transducers per the main engine. All data were normalized, therefore the data set had a magnitude range of [0, 1]. Six nominal instrumentation temperature signals collected from five independent space shuttle missions (i.e., flights) were used for training. Table 1 summarizes telemetry variables, missions, and timeframes used.  (2) ). Engine 2: E41T2153A1, E41T2154A1 (β (3) , β (4) ). Engine 1: E41T1153A1, E41T1154A1 (β (5) , β (6) ).
The structure of the DBN for this test case used three hidden layers, one input layer and one output layer. During the deep learning processes, the cross-entropy error was investigated at every epoch in each hidden layer to ensure a decreasing trend at every epoch. Many iterations of deep learning were executed by systematically changing combinations of hidden neurons at each layer and the epochs. This process employed ANOVA as a technique to help identify if a particular factor affected cross-entropy. Results from the iterative process of changing the number of hidden neurons at the three hidden layers of the DBN as well as the number of RBM epochs produced an acceptable nominal DBN model. The final DBN parameters are listed in Table 2. The neuron activation probability at the output of the DBN was tested with data from a different shuttle flight (the STS-135). When the DBN was presented with the STS-135 data as its visible neurons, neuron activation propagated through the neural network until the output was reached. The prediction was performed with 100% accuracy. This case study illustrated the feasibility of using the output of a deep belief network as a possible detector of off-nominal patterns under certain specific restrictions and conditions described in this section. Additionally, it showed that our implementation was excellent.

Selection of a Parallel and Distributed Discrete-Event Simulation (PDDES) Platform
We selected a PDDES platform to implement the different distributed simulation designs and get the corresponding results for the time and synchronization management schemes. We studied several parallel and distributed discrete-event simulation (PDDES) engines. Several PDDES platforms were reviewed during our efforts, and they are listed as follows:
The listed parallel processing computing engines can implement high-performance parallel simulation executives for discrete-event simulation applications with their own recommended compilers. Due to the superior features, we decided to use WarpIV in this research.

WarpIV Engine
WarpIV kernel can perform discrete-event simulations upon parallel and distributed settings [16,39,40]. The Warp engine can perform heterogeneous network applications using a high-speed arrangement, which mixes shared memory with standard protocols. This integration can also offer high bandwidth.
The modeling constructs and the time management schemes provided with the WarpIV engine kernels offer optimistic time mechanisms (e.g., TW, BTB, BTW). It also facilitates the componentbased and interoperability modeling paradigm for simulation model reusability. WarpIV uses memory management caching techniques.
The simulation modeler can use scheduling methods. On the other hand, this simulation kernel allows for arbitrary arguments to be specified through the event interface construct. It uses C and C++ languages.
WarpIV engine has unique features. One of the main differences is the division between simulation objects and logical processes. Simulation objects inherit from the class logical process. Logical processes (LPs) are automatically distributed during startup to different nodes in several styles (e.g., block, scatter, user-defined).

Advantages of WarpIV
The Warp engine provides the resources for scheduling event processing in sequential, parallel, and distributed settings. These resources have the following advantages:


It features state-of-the-art conservative, optimistic, and sequential time management modes.  It distributes models and simulation objects automatically across multiple processors (even using the Internet) while handling event processing in logical time.


It offers an excellent interface.  It is updated to the latest operating systems, network connections, and extensions built to enhance functionality.  It supports interoperability and reusability.  There are training courses and support available.

Programming in Warp IV
We provide an example of the case studies used to develop the training database for this research, as depicted in Figure 7. This range detection implements a parallel distributed discreteevent simulation. It models the interactions of several aircraft and radars. These are the general features of its implementation: 1. It is a discrete-event simulation program (with capabilities to be executed in parallel/distributed computing environments). WarpIV provides a rollbackable version of the standard template library (STL) to accommodate mainstream C++ programmers [16,41]. Therefore, the programming is built using C/C++. 2. Time is in seconds for the simulation clock. 3. There are two (2) types of simulation objects (SOs): There is an event TestUpdateAttribute that updates the trajectory of the aircraft at specific times. The event for the radars is Scan. At the initial simulation time, Scan is scheduled, and it happens at regular intervals depending on the technical specifications of the radars. WarpIV engine has the class logical process (LP) (Figure 8). Simulation object (SO) is a regular LP class and inherits from the LP Class. The logical process manager (LPM) can have several simulation objects (SOs), and a simulation object can belong to only one LPM. A SO manager class (that inherits from the LPM Class) for each user-defined simulation object type is automatically generated by a macro (for this case study is Aircraft and Radar). With regards to events: events always have one input message and zero or more outgoing messages that are generated and sent to create new events. Events inherit from the event class (Scan and TestUpdateAttribute for this case study). 4. The theater of operations is read from a file with the corresponding longitude and latitude. The speed (maximum and minimum) of the aircraft is read from a file (m/s). The range (scanning) of the radar can be read from a file or hardcoded in the program. See Figure 9 for an example of a theater of operations. 5. After the initialization routines, the simulation senses an aircraft's proximity to a radar utilizing the predefined technical specifications. 6. The TestUpdateAttribute event points to the method TestUpdateAttribute(). The event's framework scheduler kicks off this method at simulation time = zero. At each simulation time, each parallel instance (one for each aircraft) of the TestUpdateAttribute() method in C_RandomMotion.C (Figure 9) computes the path position of each aircraft. 7. The Scan event points to method Scan() for each radar. The event's framework scheduler kicks off this method at simulation time = zero. At each discrete simulation time, each parallel instance of the Scan() method computes the proximity of an aircraft to each ground radar. Proximity (range) is calculated in parallel using radar position and moving entity position vectors via = √∆ 2 + ∆ 2 + ∆ 2 , where ∆ represents the difference between radar and aircraft positions (∆latitude, ∆longitude, and ∆altitude) in earth-centered rotational coordinates (ECR). 8. The aircraft does not know the existence of the radars, but the radars can know their position. The aircraft detection simulation code implements each instance of aircraft as federation objects and initializes their subscription. Federation objects (Fo) are used to facilitate the grouping of entity and entity components with related attributes. The grouped attributes can then be distributed and published to other entity components and entities that are subscribers. During simulation execution, object attributes such as dynamic position (latitude, longitude, and altitude) and aircraft identification are published. Figure 10 shows the different methods in the C programming language to implement the simulation model of Figures 7-9. Now, we can execute this discrete-event simulation model in several nodes (local and global nodes, when the local and global nodes are more than 1, we have a parallel distributed discrete-event simulation system). Local nodes share the memory, and global nodes are distributed on the Internet or a private network/cluster (Figure 11). A global node is a cluster, and a local node is a computational resource from a specific cluster. For instance, global four and local one involves four clusters. Of these clusters, each one will have a single computer (in total four nodes). In addition, we can execute this model with the desired time synchronizations and management scheme. The following definitions are required to understand the experiments designed with the discrete-event simulation model and Warp IV:  T (wall-clock time) is a measure of the actual time from start to finish, containing the time due to scheduled interruptions or waiting for computational assets.  Speedup relative is the wall-clock time for a single node (sequential) divided by T (wall-clock time), considering all of the nodes used for that synchronization scheme (the wall-clock time of the node with the maximum value).
The speedup relative for the different time and synchronization management for these initial experiments is displayed in Table 3 and Figure 12. The best result of 2.9 was achieved by TW (the theoretical speedup for this problem is 3.0). BTW and TW are very comparable. BTB has better performance with multicore arrangements for this simulation case study.  It is essential to observe the differences in performance due to the configuration and the time and synchronization scheme for the case study-this graph will be different for other performance measures.

Measuring the Complexity of a Parallel Distributed Discrete-Event Simulation Implementation
Measuring simulation algorithm/software complexity is challenging. Shao and Wang [42] and Misra [43] investigated the complexity of software using the viewpoint of software due to creative activities. We are using the cognitive weights of basic control structures to measure simulation software complexity. Software constructs, such as loops, conditional statements, are assigned a weight value. The cognitive weights are as follows: a sequence is weighted with a factor of one, ifthen-else with two, case statements with a three, a for-loop with three, repeat-until with three, a function call with a factor of two, parallel structures with a factor of four, and the interrupts for synchronization with a four. Figure 12 shows an example of the weight's calculations for a program in WarpIV.
The total cognitive weight of a computer program is calculated by applying the following equation where q is the total number of "main" constructs and m and n are the nested constructs with their specific cognitive weight (Wc): Cognitive weights are one of the inputs to the DBN. Table 4 provides the cognitive weights for the implementation of the model represented in Figure 7. Additionally, we have captured other parameters that define the settings of the parallel distributed DES problems such as hardware, messaging, network, simulation objects, and classes, as explained as follows (21 inputs): Table 4. Calculation of cognitive weights for the case study. Sim.c main 17

Total Program Weights 2919
The output vector has three components that correspond to the best scheme for that specific input vector. In our case, the one with the best speedup performance (as confirmed by executing the simulation model in WarpIV (Figure 13) and the best wall-clock time). For example, if the best performance is for TW, then, the output vector is 1 for TW, 0 for BTW, and 0 for BTB.

Results
The performance criterion is the minimum wall-clock time, and this indicates the synchronization scheme with the best level of speedup achieved. The wall-clock time tells us the time it takes for the computer system to finish the simulation. It is the time to the solution: the number of seconds of wall-clock time to satisfy the termination criterion of detecting the aircraft in this case. Many research initiatives have used speedup and its relationship with the wall-clock time as a performance measure [44][45][46][47]. Table 5 indicates the training vector for the case study of Figure 7, with four global nodes and one local node using block as the distribution policy, with TW as the best performance (best wall-clock time). It is essential to say that if this case study is implemented using three global nodes and three local nodes using block as the distribution policy, then BTB is the synchronization scheme with the best performance (minimum wall-clock time). Table 5. Example of a vector that defines the parallel distributed discrete-event simulation (PDDES) implementation for the aircraft detection model of Figure 7 with 4 global nodes and 1 local node using block as the distribution policy, with TW as the best performance (best wall-clock time). Another point is that a great deal of data is needed, so numerous problems were selected to generate case studies and variations of hardware/simulation objects/nodes to train the DBN. Two hundred and forty case studies and their variations were selected for training, sixty case studies and variations for validation, the right number of neurons and hidden layers, and one hundred for testing. The variations were produced with changes in the number of global and local nodes. The training session for a DBN was accomplished. Figure 14 shows the details of the training of the DBN. The best architecture had three hidden layers with 21 inputs, 50 neurons in each hidden layer (three-hidden layers), and one output layer with three neurons (one for each time and synchronization management scheme). This DBN has the testing performance that is shown in Figure 15. Preliminary datasets were utilized with the multi-layer perceptron (backpropagation) [48]; however, the performance obtained was lower than 60%. The performance of the DBN can be increased using more case studies. The study demonstrated the feasibility of the new technique, which can be used to design parallel distributed discrete-event simulation configurations. This first effort places emphasis on speedup.

Conclusions and Further Research
The research work presented here implemented a decision-making scheme that, based on the simulation environment (software, hardware, and simulation logic), can identify the best synchronization and time management to perform a specific parallel and distributed DES. This new approach is original, pioneering, and uses deep learning. This development has the potential to save time on experimentation and provide better designs. The prototype developed in this research work can be improved and give a better performance with more case studies and even use recently developed deep learning algorithms that are more powerful. The method presented in this paper is straightforward and automatically selects the correct scheme (TW, BTW, BTB). Of course, it can be extended to more schemes, and it can continue learning with new case studies and using parallel distributed simulation repositories.
This study contributes to a new approach to an existing problem that is very complex. We recognize that PDDES is critical for the current trends in simulation and hardware/software developments. There were limitations to this research. This study was a preliminary effort; therefore, more case studies can be added to improve performance. Another point is that we focus on the DBN, and there is potential to use other types of deep learning, such as modified convolutional neural networks (CNNs) [49] and adversarial networks [50]. Modified CNNs and adversarial networks recently have been gaining attention as deep neural networks with the best performance. Another limitation was the utilization of only popular optimistic synchronization schemes. There is the potential to use other newer optimistic synchronization schemes and study load balancing among nodes [46,47]. The speedup based on the best wall-clock time was the only performance measure studied. It is essential to study more performance measures.
There are several issues that we will start exploring, and this approach may contribute. For example, cloud computing [4,[6][7][8][9]11], the World Wide Web of simulation [3,5,7,10], and autonomic computing (AC) [6]. We can use our approach to design simulators/configurations for these platforms. Nevertheless, the input will have to be modified to characterize cloud computing, webbased elements and policies (e.g., on-demand service model of simulation resources, high-level architecture (HLA) support). This area of research is required due to cloud computing being recognized as the new dominant environment for enterprise IT. Across industries, cloud computing persists to be one of the fastest-growing areas.