A Novel Computational Instrument Based on a Universal Mixture Density Network with a Gaussian Mixture Model as a Backbone for Predicting COVID-19 Variants’ Distributions

: Various published COVID-19 models have been used in epidemiological studies and healthcare planning to model and predict the spread of the disease and appropriately realign health measures and priorities given the resource limitations in the field of healthcare. However, a significant issue arises when these models need help identifying the distribution of the constituent variants of COVID-19 infections. The emergence of such a challenge means that, given limited healthcare resources, health planning would be ineffective and cost lives. This work presents a universal neural network (NN) computational instrument for predicting the mainstream symptomatic infection rate of COVID-19 and models of the distribution of its associated variants. The NN is based on a mixture density network (MDN) with a Gaussian mixture model (GMM) object as a backbone. Twelve use cases were used to demonstrate the validity and reliability of the proposed MDN. The use cases included COVID-19 data for Canada and Saudi Arabia, two date ranges (300 and 500 days), two input data modes, and three activation functions, each with different implementations of the batch size and epoch value. This array of scenarios provided an opportunity to investigate the impacts of epistemic uncertainty (EU) and aleatoric uncertainty (AU) on the prediction model’s fitting. The model accuracy readings were in the high nineties based on a tolerance margin of 0.0125. The primary outcome of this work indicates that this easy-to-use universal MDN helps provide reliable predictions of COVID-19 variant distributions and the corresponding synthesized profile of the mainstream infection rate.


Introduction
Research teams from many countries have prioritized the study of the spread of the COVID-19 virus to combat its threat to health.As a result, numerous mathematical models have been developed to study the virus's transmission .From a statistical perspective, these models can be classified into deterministic staging or compartmental models (CMs) [1][2][3][4][5][6][7][8][9][10][11] and stochastic models (SMs) [12,13].In addition, machine learning (ML) [15][16][17][18][19] and neural networks (NNs) [20][21][22][23][24][25][26][27][28] have also been used to create COVID-19 models.In some cases, ML technology has been used for COVID-19 diagnosis [28,29] and prognosis [30].However, these studies focused on models that only targeted the growth of the mainstream symptomatic infection rate of COVID- 19 and have yet to address its symptomatic variants.Therefore, the gap in these mathematical modelling techniques can specialized dense sublayers, and one concatenation layer.The output represents a set of predicted rate profiles of each COVID-19 symptomatic infection variant and the associated variant's contribution.Then, the prediction of the overall symptomatic infection rate of COVID-19 is synthesized.In addition, in this study, the impact of noise in the input data (randomness) on model fitting is discussed.Furthermore, the determination and results of the model accuracy are discussed.
When implementing the design of the proposed MDN, we faced many challenges.One of the most challenging aspects of this work was assessing the failure of model fitting and changing the configuration of the network implementation accordingly.Implementing changes could involve one or more NN configuration elements (batch size, epochs, activation function, etc.).In addition, from an early stage, the design of an NN architecture should be identified to provide the flexibility to close computing gaps and eventually deliver the necessary results that furbish a solution with which one can answer the critical questions to help create the right health plans for combatting the spread of COVID-19 infection.
The sections of this article address the challenges in the work and pave the way to the delivery of an MDN supported by logical and valid results.Section 2 reveals the concept of the two input modes and exposes the steps for deriving the solution of PCom-SEIR for the mainstream COVID-19 infection rate.In addition, this section explains the reason for using two input runs.Section 3 discusses the evaluation of the design of an MDN for modelling COVID-19 variants.The audience of this study could range from NN experts to health professionals working for health authorities.Hence, we divided this section into three subsections because of the broad spectrum of the readers' backgrounds.The first subsection presents an introductory overview of the NN technology.The second subsection introduces the theory behind the MDN.The third subsection presents notes about the architecture, design, and implementation of the MDN.Section 4 presents results covering two sets of twelve use cases-one for Canada and the other for Saudi Arabia.In this section, we present a table that maps the twenty-four sub-use cases to the corresponding diagrams to ease navigation through the reported results.These sub-use cases cover different data-range scenarios, two input data modes (to explore the influence of two kinds of uncertainty (epistemic and aleatoric uncertainty) on the model fitting and accuracy), and the implementations of three activation functions.In Section 5, we present the conclusions, the most important of which is that the proposed MDN is a valid and reliable tool for predicting the distribution of COVID-19 variants.

COVID-19 Input Data Modes
To deploy the MDN [36], we devised two input modes for the implementation, as shown in the diagram of the topology depicted in Figure 1.At an abstract level, the difference between the two modes was the degree of noise embedded in the data (randomness).The first mode was based on using raw data on COVID-19 from the WHO [37], which we expected to have a high noise level due to the natural data uncertainty and possible irregularities in the collection process.These two sources of data noise reduced the continuity (i.e., the potential absence of the data profile's first and/or second derivative).The second mode involved the optimal solution of a modified PCom-SEIR SODE from [1].The set of data that formed the optimal solution was expected to have less uncertainty due to the inherent curve smoothing, which led to continuity in the profile.Below, we show the steps of modifying the derivation to obtain only the mainstream infection rate.
The modification of the PCom-SEIR system eliminated the differential equations that represent COVID-19 variants, which are shown in Equation (5).This modification resulted in the following SODE: The relationships in Equations ( 1)-( 9) form the structural foundations that steer the transition dynamics within the life cycle of COVID-19, as shown in Figure 2. To ease the mapping between Equations ( 1)-( 9) and the PCom-SEIR framework shown in Figure 2, we created two tables: 1.
One in which the time-dependent variable of the infection compartment's population is Z(t), at which Z(t) ∈ {S(t), P(t), E(t), I(t), M(t), H(t), Q(t), D(t), R(t)}.Table 1 exhibits the definitions of the mentioned time-dependent variables.For the rest of this communication, the time-independent notation for the compartment rate variables is dropped to facilitate reading, when necessary.2.
Table 1.Description of the time-dependent variables in the equations.

S(t)
The population of the susceptible compartment.

P(t)
The population of the protected compartment.

E(t)
The population of the exposed compartment.

I(t)
The infection population of the symptomatic infection compartment.

M(t)
The population with asymptomatic infection.

Q(t)
The population of the quarantined compartment.

H(t)
The population of the hospitalised compartment.

D(t)
The population of the dead compartment.
Mathematics 2024, 12, x FOR PEER REVIEW 7 of 25 The use case for Canada showed the inherent influence of the combination of federal and various provincial health controls and district statistical measures.On the contrary, the use case for Saudi Arabia revealed that there was one central health system with one set of statistical measures [2].
We employed a derivative-free optimization strategy, as in an intelligent search challenge [35,38].As a core optimization challenge, one or more computational agents were utilized to find the optima in a real-valued search space with the embedded set of initial conditions [39].Particle swarm optimization (PSO) [40] and differential evolution (DE) [35] are two of the most outstanding procedures.DE uses a sort of differential operator that could be easily invoked and applied via the adjustment of the model parameters [39] defined in Equations ( 1)- (9).The optimal solutions for differential Equations ( 1)-( 9) are shown in Figures 3 and 4 for Canada and Saudi Arabia, respectively.The use case for Canada showed the inherent influence of the combination of federal and various provincial health controls and district statistical measures.On the contrary, the use case for Saudi Arabia revealed that there was one central health system with one set of statistical measures [2].
We employed a derivative-free optimization strategy, as in an intelligent search challenge [35,38].As a core optimization challenge, one or more computational agents were utilized to find the optima in a real-valued search space with the embedded set of initial conditions [39].Particle swarm optimization (PSO) [40] and differential evolution (DE) [35] are two of the most outstanding procedures.DE uses a sort of differential operator that could be easily invoked and applied via the adjustment of the model parameters [39] defined in Equations ( 1)- (9).The optimal solutions for differential Equations ( 1)-( 9) are shown in Figures 3 and 4 for Canada and Saudi Arabia, respectively.The use case for Canada showed the inherent influence of the combination of fe and various provincial health controls and district statistical measures.On the con the use case for Saudi Arabia revealed that there was one central health system wit set of statistical measures [2].
We employed a derivative-free optimization strategy, as in an intelligent search lenge [35,38].As a core optimization challenge, one or more computational agents utilized to find the optima in a real-valued search space with the embedded set of i conditions [39].Particle swarm optimization (PSO) [40] and differential evolution [35] are two of the most outstanding procedures.DE uses a sort of differential ope that could be easily invoked and applied via the adjustment of the model parameter defined in Equations ( 1)- (9).The optimal solutions for differential Equations ( 1)-( shown in Figures 3 and 4 for Canada and Saudi Arabia, respectively.[37] and the output of the PCom-SEIR model.[37] and the output of the PCom-SEIR mod

Mixture Density Network (MDN) with Multiple Outputs
This section explains the evolution of the concept of the MDN, starting from a sim neural network (NN).This review includes four subsections, allowing the reader to sk the subsections that seem elementary or familiar.This first subsection starts with a sim neural network (NN) with one average target data output and a probability density (p based on a parametric set of two elements.The second part provides the theory and logical progression of the NN to a full-fledged MDN.The architecture, design, and imp mentation required to run the MDN code are covered in the third subsection.The fou part of this section describes the input data modes to help describe the impacts of epistemic uncertainty (EU) and aleatoric uncertainty (AU) [41] on the stability of the o puts of the MDN.

Neural Networks (NNs): An Introductory Overview
An NN with one output typically consists of an input layer, one or more hidden l ers, and an output layer, as depicted in Figure 5.Each layer contains neurons, nodes, units (in this study, we will use the word "neurons"), and these neurons are interco nected with each other through weighted connections.The input value xi of the process unit is multiplied by the connection weight wkj, which simulates the learning in an artific neural network (ANN) by adjusting the strength or weight of the connection.
The relationship between the input (x) and output (y) is then formulated as This relationship is a reflection of the model's hypothesis, which is embedded in controlled hidden layer.The function , whose input is weighted and can be shifted the bias when needed, allows the algorithm to learn the governing relationship.[37] and the output of the PCom-SEIR model.

Mixture Density Network (MDN) with Multiple Outputs
This section explains the evolution of the concept of the MDN, starting from a simple neural network (NN).This review includes four subsections, allowing the reader to skip the subsections that seem elementary or familiar.This first subsection starts with a simple neural network (NN) with one average target data output and a probability density (pd) based on a parametric set of two elements.The second part provides the theory and the logical progression of the NN to a full-fledged MDN.The architecture, design, and implementation required to run the MDN code are covered in the third subsection.

Neural Networks (NNs): An Introductory Overview
An NN with one output typically consists of an input layer, one or more hidden layers, and an output layer, as depicted in Figure 5.Each layer contains neurons, nodes, or units (in this study, we will use the word "neurons"), and these neurons are interconnected with each other through weighted connections.The input value x i of the processing unit is multiplied by the connection weight w kj , which simulates the learning in an artificial neural network (ANN) by adjusting the strength or weight of the connection.
The relationship between the input (x) and output (y) is then formulated as This relationship is a reflection of the model's hypothesis, which is embedded in the controlled hidden layer.The function f , whose input is weighted and can be shifted by the bias when needed, allows the algorithm to learn the governing relationship.
A simple feedforward NN [41] consists of (1) one input layer that digests a given set of input variables x ≡ {x 1 , x 2 , . .., x J }, where J is the number of input features, (2) one hidden layer with K neurons, and (3) one output layer with one neuron to produce an associated mapping y.
An NN model deploys an associated mapping to learn a transformation from a given set of input variables x to a set of output variables y ≡ {y 1 , y 2 , . .., y n }.In practice, such a network is trained to utilize a finite set of samples, which can be denoted by [{x} q , {y} q ], where q = 1, 2, . .., Q are the training sets under investigation.In other words, the principal aim of network training is to model the causal sources of event data.Hence, the best possible predictions for the y vector can be made when the trained network is presented with a new value of x.The data source can be expressed statistically in terms of the probability density function (PDF) p(x, y) in a joint-input target space.A simple feedforward NN [42] consists of (1) one input layer that digests a given set of input variables x ≡ {x1, x2, …, xJ}, where J is the number of input features, (2) one hidden layer with K neurons, and (3) one output layer with one neuron to produce an associated mapping y.
An NN model deploys an associated mapping to learn a transformation from a given set of input variables x to a set of output variables y ≡ {y1, y2, …, yn}.In practice, such a network is trained to utilize a finite set of samples, which can be denoted by [{x}q, {y}q], where q = 1, 2, …, Q are the training sets under investigation.In other words, the principal aim of network training is to model the causal sources of event data.Hence, the best possible predictions for the y vector can be made when the trained network is presented with a new value of x.The data source can be expressed statistically in terms of the probability density function (PDF) p(x, y) in a joint-input target space.
A joint PDF with time as an additional independent variable is required if the data source evolves over time.For temporal data, in time-series analysis, the time itself (i.e., timestamps) can be a feature that is used directly in the input layer.For example, in a dataset containing recordings of daily infected populations, the date or time of the day can be an explicit input feature.However, instead of using the raw date/time values, neural networks (NNs) can learn temporal relationships implicitly.In sequential data such as published COVID-19 data, the position of a data point within the sequence (time step) can be implicitly encoded as a feature.RNNs, long short-term memory (LSTM) networks, gated recurrent units (GRUs), and other sequential models are particularly suitable for handling time-related data (e.g., windowing/time windows (TWs) and temporal embeddings (TEs)), due to their ability to maintain memory across time steps.
In summary, time can indeed be treated as an input feature in neural networks, especially when dealing with sequential or time-dependent data such as collected COVID-19 data [37], and it can be represented in various ways depending on the problem domain and data characteristics.While RNNs, LSTM networks, and GRUs are valid approaches, they have their underlying problems.The reason for this is that TWs and TEs might suppress important behaviors that exist in x, leading to an inaccurate prediction of y.A joint PDF with time as an additional independent variable is required if the data source evolves over time.For temporal data, in time-series analysis, the time itself (i.e., timestamps) can be a feature that is used directly in the input layer.For example, in a dataset containing recordings of daily infected populations, the date or time of the day can be an explicit input feature.However, instead of using the raw date/time values, neural networks (NNs) can learn temporal relationships implicitly.In sequential data such as published COVID-19 data, the position of a data point within the sequence (time step) can be implicitly encoded as a feature.RNNs, long short-term memory (LSTM) networks, gated recurrent units (GRUs), and other sequential models are particularly suitable for handling time-related data (e.g., windowing/time windows (TWs) and temporal embeddings (TEs)), due to their ability to maintain memory across time steps.
In summary, time can indeed be treated as an input feature in neural networks, especially when dealing with sequential or time-dependent data such as collected COVID-19 data [37], and it can be represented in various ways depending on the problem domain and data characteristics.While RNNs, LSTM networks, and GRUs are valid approaches, they have their underlying problems.The reason for this is that TWs and TEs might suppress important behaviors that exist in x, leading to an inaccurate prediction of y.Nevertheless, these techniques are good for comparing and assessing the validity of a proposed NN design.For associated mapping scenarios, such as that of the COVID-19 infection rate that we consider in this study, it is suitable to decompose the joint probability density p(x, y) into the product of the conditional density of the target data p(y|x), which is conditioned on that of the input data p(x) [41]: p(x,y) = p(y|x) p(x), where the density p(x) = p(x,y) dy plays a crucial role in confirming the predictions of the trained networks.Nevertheless, to predict the value of y corresponding to the input set {x j } of feature x, we need to focus on the conditional density p(y|x) model rather than the average target value.This is a fundamental aspect on which we should keep our focus.

Mixture Density Network (MDN): The Theory
Typically, data scientists apply GMMs with regression and classification techniques to build the targeted MDN, and the architecture provides the necessary solution for the underlying statistical problem.However, it is known that GMMs perform clustering far better than well-known techniques such as K-means [42].A GMM, as a computational entity within an MDN, produces a set of COVID-19 variants/subsamples {I n (t)}, where n є {1, 2, . .., N}, and N is the maximum number of variants in a COVID-19 infection wave.The GMM is computationally equipped to determine the {I n (t)} set from the corresponding infected population I(t), which is a principal input into the input layer (Figure 1).I(t) can be logically assumed to be normally distributed, even if it does not fall under the case of a normal distribution.This is possible because any random variable can be transformed from its actual distribution into a normal distribution [43].Nevertheless, one might be able to find two or more variants (subsamples) of data for I(t) that can indeed be described with a normal distribution.Thus, to describe the entire population of I(t), we do not assume a Gaussian distribution across the whole population but, rather, synthesize the actual underlying distribution as a mixture of Gaussian ones with different contributions (α), variances (σ), and means (µ).
A data scientist working for a health authority might be able to predict I(t) over time in the future.However, suppose that a region has a specific COVID-19 variant that dominates over another variant or other variants.In that case, it is logical to assume that there will be a definitive number of clusters/groups N of COVID-19 variants {I n (t)}.With different health controls across regions and the nature of the variants, type j of the variant may dominate the other types k, where j, k є {1, 2, 3, . .., N} and k ̸ = j.Suppose that we construct an ML regression model (e.g., an NN) to estimate the population of an individual I n (t).In that case, we have two potential deficiencies: (1) We can only predict the conditional µ of the distribution.This means that we will not have a definitive and complete view of the extent of I n (t) to propose the right health control actions.
(2) The health authority will not be able to properly discriminate between the multimodal distributions if the entire population of I(t) is characterized by a single distribution that could be assumed to be Gaussian.To overcome these two weaknesses, a simple feedforward NN for regression purposes probably looks something like that in Figure 6, where there are two output sets of the mean (µ) and variance (σ), which represent the two basic elements of the targeted distributions.
The standard NN architecture, which is shown in Figure 6, involves injecting a collection of input variables {x q } into the input layer of the network, determining weights in a subsequent layer, and eventually modelling an estimate of the output y.Then, once a possible backpropagation action has been commenced, the weights in the NN can be altered when necessary, and the best estimate of y is eventually provided as a prediction that yields a single value.This single value is inadequate for effective health planning because different types of variants require different diagnoses, treatments, and levels of health control measures.
To overcome this challenge, we must generate an output for the probability distributions across an array of values described by sets of the mean (µ) and variance (σ), thus defining a conditional probability distribution (CPD); hence, the network must produce two outputs, not one, as depicted in Figure 6.However, in principle, the computational learning process in Figure 5 stays the same as that in Figure 6, with each forward and backward phase slightly modifying the weights (w jk , u jk ) until the minimum error is small enough to meet the target requirements of the necessary output.In summary, the distinguishing difference is that the NN in Figure 6 produces two values: one for µ and one for σ.These two founding attributes are basically what is required to, for example, determine the PDF of the Gaussian distribution here.Consequently, to predict µ and the associated σ of the expected Gaussian distribution, there should be two neurons in the last layer (as shown in Figure 6) as opposed to one, as shown in Figure 5.The standard NN architecture, which is shown in Figure 6, involves injecting a collection of input variables {xq} into the input layer of the network, determining weights in a subsequent layer, and eventually modelling an estimate of the output y.Then, once a possible backpropagation action has been commenced, the weights in the NN can be altered when necessary, and the best estimate of y is eventually provided as a prediction that yields a single value.This single value is inadequate for effective health planning because different types of variants require different diagnoses, treatments, and levels of health control measures.
To overcome this challenge, we must generate an output for the probability distributions across an array of values described by sets of the mean (µ) and variance (σ), thus defining a conditional probability distribution (CPD); hence, the network must produce two outputs, not one, as depicted in Figure 6.However, in principle, the computational learning process in Figure 5 stays the same as that in Figure 6, with each forward and backward phase slightly modifying the weights (wjk, ujk) until the minimum error is small enough to meet the target requirements of the necessary output.In summary, the distinguishing difference is that the NN in Figure 6 produces two values: one for µ and one for σ.These two founding attributes are basically what is required to, for example, determine the PDF of the Gaussian distribution here.Consequently, to predict µ and the associated σ of the expected Gaussian distribution, there should be two neurons in the last layer (as shown in Figure 6) as opposed to one, as shown in Figure 5.
It is worth mentioning that a CNN is an implementation of an NN in which the minimization of the sum-of-squares error gives rise to network functions that approximate the conditional mean/average of the target output.When a classification in which the target variables have one of N data boundaries is required, these conditional averages denote the posterior probabilities of class membership and, thus, can be considered as providing an optimal output.Hence, for problems involving the prediction of continuous variables, the conditional average demonstrates an extremely limited portrayal of the statistical profile of the target data and will be entirely insufficient for many scenarios.The MDN is expected to overcome these limitations and provide a completely general framework for modelling conditional PDFs.The fundamental MDN form combines a CNN with a mixture density model.It is worth mentioning that a CNN is an implementation of an NN in which the minimization of the sum-of-squares error gives rise to network functions that approximate the conditional mean/average of the target output.When a classification in which the target variables have one of N data boundaries is required, these conditional averages denote the posterior probabilities of class membership and, thus, can be considered as providing an optimal output.Hence, for problems involving the prediction of continuous variables, the conditional average demonstrates an extremely limited portrayal of the statistical profile of the target data and will be entirely insufficient for many scenarios.The MDN is expected to overcome these limitations and provide a completely general framework for modelling conditional PDFs.The fundamental MDN form combines a CNN with a mixture density model.
We should design and build an NN that can determine more than a single distribution, to improve the above logical approach.To make this possible, we add N of {µ n , σ n } sets, each corresponding to one COVID-19 variant I n (t).These distributions should be nearly sufficient to describe the known information and published data I(t) on COVID-19.It is a known fact that COVID-19 variants have different levels of population dominance.In other words, the forecasted distributions of the variants will have different levels of contribution/weight (α n ) for the synthesized I(t).To map this requirement to our new NN design, the NN output should include N of {µ n , σ n , α n } sets, with each set corresponding to one predicted COVID-19 variant.The new NN design can be expected to have the layout shown in Figure 7.This MDN with a GMM as a backbone can be used to predict the individual components I n (t) of the COVID-19 infection compartments.In conclusion, MDNs are built from two components-an NN and a mixture model [40], which is implemented in the form of a GMM object, as illustrated in Figure 8b.Hence, the output layer of an MDN should be equipped with a GMM to synthesize the hidden layer's outputs to the relevant distribution components, as shown in Figure 7.
out shown in Figure 7.This MDN with a GMM as a backbone can be used to pred individual components In(t) of the COVID-19 infection compartments.In concl MDNs are built from two components-an NN and a mixture model [40], which is mented in the form of a GMM object, as illustrated in Figure 8b.Hence, the outpu of an MDN should be equipped with a GMM to synthesize the hidden layer's outp the relevant distribution components, as shown in Figure 7.

Architecture, Design, and Implementation Notes
The realization of Figure 7 was achieved by layering system architecture shown in Figure 8a and the implementation of the class diagram shown in Figure 8b.In Figure 8b, it can be seen that the implementation depended on the version of TensorFlow.We started with TensorFlow version 1.8.0 and then used version 2.13.0 on different machine images.This allowed us to detect any possible bugs in the versions and compare the obtained results.

Architecture, Design, and Implementation Notes
The realization of Figure 7 was achieved by adopting the layering system architecture shown in Figure 8a and the implementation of the class diagram shown in Figure 8b.
In Figure 8b, it can be seen that the implementation depended on the version of TensorFlow.We started with TensorFlow version 1.8.0 and then used version 2.13.0 on different machine images.This allowed us to detect any possible bugs in the versions and compare the obtained results.

Results and Discussion
One of the main objectives of this study was to demonstrate the ability of the MDN to provide the distribution profiles of COVID-19 variants corresponding to a set of continuous prediction models.
A comparison with the results of similar studies is an essential part of the discussion in a typical examination of new results.However, this was not possible due to the absence of such identical work.Nevertheless, the results of the variants' distribution profiles in this study were compared with the distributions of infection variants reported in [1] with PCom-SEIR, which were based on the optimal solutions of a SODE.

Use Case Implementation Dictionary
To ease the navigation through the use cases, we created Tables 3 and 4 to show the mapping of these use cases and the corresponding results.The coverage of these results was based on two main groups; each leading group corresponded to the COVID-19 data from each country (Canada (Figures 9 and 11) and Saudi Arabia (Figures 10 and 12)).Each leading country group included two input mode sub-groups.Each input mode sub-group covered two data ranges.Each data range sub-group included three activation functions.Each activation function implementation utilized two epoch values {10, 20}.Consequently, there were eight scenarios for each COVID-19 variant distribution I n (t) per country.This array of outputs should be enough to conclude the reliability and validity of the proposed MDN computational instrument.The results are divided into two categories: (1) each set of three I n (t) values and the corresponding synthesized and predicted COVID-19 infection rate I(t) and (2) the MDN loss performance curves.Table 3A Table 3B Table 4 shows the following execution parameters: (1) Input Modes: In Figure 1, the diagram shows the deployment of the two implementations depending on the type of COVID-19 input data for Canada and Saudi Arabia.The first input feed was raw COVID-19 data from the WHO [37]; hence, we named this source "WHO COVID-19 data".The second input feed was the optimal solution of the SODE in Equations ( 1)-( 9), which was derived in Section 2. The results are depicted in Figure 4 (for Canada) and Figure 5 (for Saudi Arabia).In this study, we named this input feed "PCom-SEIR" because it relied on a modification of the PCom-SEIR model.(2) Data Ranges: The two input feeds covered two spans: 300 and 500 days.
(3) The MDN implementation runs: There were three implementation runs; each run involved the utilization of one activation function in the hidden layers.The activation function set was {relu, tanh, sigmoid}.
Table 4 presents the following results: (1) Diagrams of the COVID-19 variants' distributions {I n (t)} and their predicted I(t) values as a synthesized Gaussian distribution; (2) Diagrams of the MDN's loss performance, which includes (a) loss vs. epochs; (b) val_loss vs. epochs.

Implementation Configurations and Environments
Furthermore, to show the validity and credibility of the new MDN-based computational instrument, we examined several use cases covering different input modes for COVID-19 data, date ranges, and network implementations.
The change in the implementation functionality meant that we examined the results by changing part of the implementation of the MDN design objects shown in Figure 8b and not the MDN architecture displayed in Figure 8a.While keeping the layer sequencing the same as in Figure 7, the implementation changes were fulfilled by changing the activation function in the hidden layer, which provided the average predicted distribution profiles.
We also changed the number of hidden layers, number of neurons, batch size, and number of epochs.These changes are tools for preventing overfitting/underfitting and achieving higher accuracy in modelling.Table 3 shows the main attributes of the configuration of the MDN and its practical and universal configuration when run across all use cases.

The MDN's Predictions of COVID-19 Variants
As stated above, the input data covered two modes: (1) raw COVID-19 data, which we refer to as WHO data [37], and (2) the PCom-SEIR data, which represents the overall infection rate prediction that was computed based on the optimal solution of Equations ( 1)-( 9).The two input datasets from Canada and Saudi Arabia, respectively, are illustrated in Figures 3 and 4.These two input feeds were part of the deployment of the implementation modes depicted in Figure 1.
The proposed MDN had a layered structure, as shown in Figure 7.The network's implementation used objects derived from the class diagram in Figure 8b.By portioning the uploaded COVID-19 data, we obtained a training set (2/3) and a test set (1/3), as shown in Table 3.A sequential model object digested the training set, while the MDN's hidden layers processed the output from the input layer (architecture in Figure 7 and design in Figure 8b) consisting of three dense sub-layers, with each handling one of the corresponding elements of the COVID-19 variant distribution parameter set {µ n , σ n , α n }.The outputs from these dense sub-layers were passed to a dense concatenation layer, which coordinated with the GMM object (Figures 7 and 8b Such information is vital to health authorities when planning health control measures and preparing for the impact of the spread of severe infections.The profiles of the components predicted by the MDN corresponding to COVID-19 variant I n (t) values are different from those reported in Figures 2 and 3 of [1].In the referred study, the COVID-19 variant profiles are not fully Gaussian in shape; all variants practically share the same start and end dates but have different peaks.

The Impacts of Epistemic Uncertainty and Aleatoric Uncertainty on Component Predictions
The following discussion exposes the motives behind using two different input modes; i.e., the two types of input data shown in Figures 3 and 4. From the uncertainty angle (randomness of input data), a typical supervised machine learning problem can be formulated as follows [44]: L is the learned function that maps the input I to the output Y using the parameter θ.Here, the epistemic uncertainty (EU), which describes what the model does not know, is derived from θ and the inherent aleatoric uncertainty (AU), which is part of the datagenerating process of I.A high EU was found in part of the input feature space of the COVID-19 data published in [37], sporadically populated with data samples.In such an m-dimensional space, many parameters might explain the given data points, which gives rise to uncertainty.Based on this line of thinking, we decided to deploy two executions (see Figure 1) corresponding to the two data mode use cases.The first run took the output of the PCom-SEIR engine as an input (Figure 9 for Canada and Figure 10 for Saudi Arabia).
In contrast, the second run took the raw COVID-19 data from the WHO [37] (Figure 11 for Canada and Figure 12 for Saudi Arabia).The PCom-SEIR run was expected to have a lower level of EU.Regarding the AU, in a run with a larger data scope (500+ days), we could expect a higher value.In summary, we used the training loss (loss) and validation loss (val_loss) performance as footprints of the severity of the EU and AU, respectively.It is good to remember that the loss and val_loss curves reflect the level of continuity of the data and the type of deployment of the network's constituent layers.The network layer deployment included the type of layer, the type of activation function, the batch size, and the number of epochs.
for Saudi Arabia.
The component distribution curves, which are shown in Figures 9A, 10A, 11A, and 12A, indicate that the MDN managed to produce COVID-19 variant In(t) values as Gaussian distributions with different sets of distribution parameters {µ, σ, α}.As illustrated in these figures, different values of µ and σ imply that each variant candidate in the set {In(t)} has a different extent of time (lengths in days) and different start/end days, with the different variants' peaks mostly taking place on different days (the unit of time of observation).Such information is vital to health authorities when planning health control measures and preparing for the impact of the spread of severe infections.The profiles of the components predicted by the MDN corresponding to COVID-19 variant In(t) values are different from those reported in Figures 2 and 3 of [1].In the referred study, the COVID-19 variant profiles are not fully Gaussian in shape; all variants practically share the same start and end dates but have different peaks.

The Impacts of Epistemic Uncertainty and Aleatoric Uncertainty on Component Predictions
The following discussion exposes the motives behind using two different input modes; i.e., the two types of input data shown in Figures 3 and 4. From the uncertainty angle (randomness of input data), a typical supervised machine learning problem can be formulated as follows [41]: L is the learned function that maps the input I to the output Y using the parameter θ.
Here, the epistemic uncertainty (EU), which describes what the model does not know, is derived from θ and the inherent aleatoric uncertainty (AU), which is part of the datagenerating process of I.A high EU was found in part of the input feature space of the Hence, we produced loss performance curves for a set of activation functions {relu, tanh, sigmoid}.These curves are shown in Figures 9B and 11B for Canada and in Figures 10B and 12B for Saudi Arabia.In most deep learning projects, the graph of the loss and val_loss is a cross-validation (CV) because it encompasses the training and test data domains.The best cumulative yardstick describing the loss performance is the model fitness set {underfitting, overfitting, optimal fit} [45].Suppose that the network input data feed has a high noise level (random fluctuations are a source of uncertainty), as in the WHO data.In that case, we would expect to have underfitting and overfitting most often.Such a scenario requires a particular activation function, batch size, and number of epochs.Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noise (behavior of polarizatio PCom-SEIR data feed, we should expect optimal fitting more than underfitt fitting.The resulting mapping in Table 5 shows a summary of the MDN' mance, which indicates the fitting levels for the data feed and various activat as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and b was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both coun tween 33% and 66%.The 33% optimal fitting occurred when we used activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada a 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid ac tion produced 100% optimal fitting for both input data feeds.Relu an 100% optimal fit for the PCom-SEIR input data feed and nearly zero input data feed.This meant that the algorithm, alongside the statistical input data, played a part in governing the model fitting for the same set and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.hand, there are simple models that are, to a certain extent, less complex and and less accurate but more interpretable.Interpretability refers to the degr model allows for human understanding of natural phenomena [45].Additio flexibility indicates a model's capacity to adapt, evolve, and learn from inp sequently, flexible models should be used when research aims to predict av However, when the objective of an investigation is inference, inflexible mod relevant because they more easily interpret the relationship between the re bles and the predictor variables in the average profile [45].The model in t type of inflexible model.
The architecture of the proposed MDN-based computational instrume in Figure 7, and Section 2 describes the two input data (y_true) modes.The architecture shown in Figure 8b suggests that there were two reference outp was a transitional output (y_tran) from the hidden layers, which fed the through the output layers.The second output was the prediction of the COVID-19 infection rate (y_pred) at the concatenation layer based on the in from the GMM object.Since we did not work with classification but with p tion using an inflexible model, we needed to define an approach for co model's accuracy.A formula that equates y_pred to y_true at any observ defined as follows: Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noise (behavior of p PCom-SEIR data feed, we should expect optimal fitting more than fitting.The resulting mapping in Table 5 shows a summary of th mance, which indicates the fitting levels for the data feed and vario as follows: (1) The fitting level of the PCom-SEIR data feed for both data rang was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and b tween 33% and 66%.The 33% optimal fitting occurred when activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for C 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sig tion produced 100% optimal fitting for both input data feeds 100% optimal fit for the PCom-SEIR input data feed and nea input data feed.This meant that the algorithm, alongside the s input data, played a part in governing the model fitting for the and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible hand, there are simple models that are, to a certain extent, less comp and less accurate but more interpretable.Interpretability refers to model allows for human understanding of natural phenomena [45 flexibility indicates a model's capacity to adapt, evolve, and learn sequently, flexible models should be used when research aims to p However, when the objective of an investigation is inference, infle relevant because they more easily interpret the relationship betwe bles and the predictor variables in the average profile [45].The m type of inflexible model.
The architecture of the proposed MDN-based computational in Figure 7, and Section 2 describes the two input data (y_true) mo architecture shown in Figure 8b suggests that there were two refer was a transitional output (y_tran) from the hidden layers, which through the output layers.The second output was the predictio COVID-19 infection rate (y_pred) at the concatenation layer based from the GMM object.Since we did not work with classification b tion using an inflexible model, we needed to define an approa model's accuracy.A formula that equates y_pred to y_true at an defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noi PCom-SEIR data feed, we should expect optimal fitting.The resulting mapping in Table 5 shows mance, which indicates the fitting levels for the d as follows: (1) The fitting level of the PCom-SEIR data feed was 100% optimal, regardless of the activatio (2) The fitting level of the WHO data for both tween 33% and 66%.The 33% optimal fittin activation function.
(3) By observing the performance in Figures 9B  10B(c) and 12B(c) for Saudi Arabia, one can tion produced 100% optimal fitting for both 100% optimal fit for the PCom-SEIR input d input data feed.This meant that the algorith input data, played a part in governing the mo and epochs.Optimal fit Underfitt

The MDN Model's Accuracy
Statistical models with complexity are refer hand, there are simple models that are, to a certai and less accurate but more interpretable.Interpr model allows for human understanding of natur flexibility indicates a model's capacity to adapt, sequently, flexible models should be used when However, when the objective of an investigation relevant because they more easily interpret the r bles and the predictor variables in the average p type of inflexible model.
The architecture of the proposed MDN-base in Figure 7, and Section 2 describes the two inpu architecture shown in Figure 8b suggests that the was a transitional output (y_tran) from the hid through the output layers.The second output COVID-19 infection rate (y_pred) at the concaten from the GMM object.Since we did not work wi tion using an inflexible model, we needed to model's accuracy.A formula that equates y_pre defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noise (behavior of polarizatio PCom-SEIR data feed, we should expect optimal fitting more than underfitt fitting.The resulting mapping in Table 5 shows a summary of the MDN' mance, which indicates the fitting levels for the data feed and various activat as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and b was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both coun tween 33% and 66%.The 33% optimal fitting occurred when we used activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada a 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid ac tion produced 100% optimal fitting for both input data feeds.Relu an 100% optimal fit for the PCom-SEIR input data feed and nearly zero input data feed.This meant that the algorithm, alongside the statistical input data, played a part in governing the model fitting for the same set and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.hand, there are simple models that are, to a certain extent, less complex and and less accurate but more interpretable.Interpretability refers to the degr model allows for human understanding of natural phenomena [45].Additio flexibility indicates a model's capacity to adapt, evolve, and learn from inp sequently, flexible models should be used when research aims to predict av However, when the objective of an investigation is inference, inflexible mod relevant because they more easily interpret the relationship between the re bles and the predictor variables in the average profile [45].The model in t type of inflexible model.
The architecture of the proposed MDN-based computational instrume in Figure 7, and Section 2 describes the two input data (y_true) modes.The architecture shown in Figure 8b suggests that there were two reference outp was a transitional output (y_tran) from the hidden layers, which fed the through the output layers.The second output was the prediction of the COVID-19 infection rate (y_pred) at the concatenation layer based on the in from the GMM object.Since we did not work with classification but with p tion using an inflexible model, we needed to define an approach for co model's accuracy.A formula that equates y_pred to y_true at any observ defined as follows: Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noise (behavior of p PCom-SEIR data feed, we should expect optimal fitting more than fitting.The resulting mapping in Table 5 shows a summary of th mance, which indicates the fitting levels for the data feed and vario as follows: (1) The fitting level of the PCom-SEIR data feed for both data rang was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and b tween 33% and 66%.The 33% optimal fitting occurred when activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for C 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sig tion produced 100% optimal fitting for both input data feeds 100% optimal fit for the PCom-SEIR input data feed and nea input data feed.This meant that the algorithm, alongside the s input data, played a part in governing the model fitting for the and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible hand, there are simple models that are, to a certain extent, less comp and less accurate but more interpretable.Interpretability refers to model allows for human understanding of natural phenomena [45 flexibility indicates a model's capacity to adapt, evolve, and learn sequently, flexible models should be used when research aims to p However, when the objective of an investigation is inference, infle relevant because they more easily interpret the relationship betwe bles and the predictor variables in the average profile [45].The m type of inflexible model.
The architecture of the proposed MDN-based computational in Figure 7, and Section 2 describes the two input data (y_true) mo architecture shown in Figure 8b suggests that there were two refer was a transitional output (y_tran) from the hidden layers, which through the output layers.The second output was the predictio COVID-19 infection rate (y_pred) at the concatenation layer based from the GMM object.Since we did not work with classification b tion using an inflexible model, we needed to define an approa model's accuracy.A formula that equates y_pred to y_true at an defined as follows: Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noise (behavior of polarizat PCom-SEIR data feed, we should expect optimal fitting more than underfi fitting.The resulting mapping in Table 5 shows a summary of the MDN mance, which indicates the fitting levels for the data feed and various activa as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both cou tween 33% and 66%.The 33% optimal fitting occurred when we use activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid a tion produced 100% optimal fitting for both input data feeds.Relu a 100% optimal fit for the PCom-SEIR input data feed and nearly zero input data feed.This meant that the algorithm, alongside the statistica input data, played a part in governing the model fitting for the same se and epochs.

Loss Legend
Optimal fit Underfitting Overfitting

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models hand, there are simple models that are, to a certain extent, less complex and and less accurate but more interpretable.Interpretability refers to the deg model allows for human understanding of natural phenomena [45].Addit flexibility indicates a model's capacity to adapt, evolve, and learn from in sequently, flexible models should be used when research aims to predict a However, when the objective of an investigation is inference, inflexible mo relevant because they more easily interpret the relationship between the r bles and the predictor variables in the average profile [45].The model in type of inflexible model.
The architecture of the proposed MDN-based computational instrum in Figure 7, and Section 2 describes the two input data (y_true) modes.Th architecture shown in Figure 8b suggests that there were two reference ou was a transitional output (y_tran) from the hidden layers, which fed th through the output layers.The second output was the prediction of th COVID-19 infection rate (y_pred) at the concatenation layer based on the i from the GMM object.Since we did not work with classification but with tion using an inflexible model, we needed to define an approach for c model's accuracy.A formula that equates y_pred to y_true at any obser defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as i PCom-SEIR data feed, we should expect optimal fitting more than underfitting and fitting.The resulting mapping in Table 5 shows a summary of the MDN's loss pe mance, which indicates the fitting levels for the data feed and various activation func as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both coun was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries wa tween 33% and 66%.The 33% optimal fitting occurred when we used the sig activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Fig 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation tion produced 100% optimal fitting for both input data feeds.Relu and Tanh h 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the W input data feed.This meant that the algorithm, alongside the statistical profile o input data, played a part in governing the model fitting for the same set of batch and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the hand, there are simple models that are, to a certain extent, less complex and inflexibl and less accurate but more interpretable.Interpretability refers to the degree to wh model allows for human understanding of natural phenomena [45].Additionally, m flexibility indicates a model's capacity to adapt, evolve, and learn from input data.sequently, flexible models should be used when research aims to predict average va However, when the objective of an investigation is inference, inflexible models are relevant because they more easily interpret the relationship between the response v bles and the predictor variables in the average profile [45].The model in this work type of inflexible model.
The architecture of the proposed MDN-based computational instrument is dep in Figure 7, and Section 2 describes the two input data (y_true) modes.The design o architecture shown in Figure 8b suggests that there were two reference outputs.The was a transitional output (y_tran) from the hidden layers, which fed the GMM o through the output layers.The second output was the prediction of the synthe COVID-19 infection rate (y_pred) at the concatenation layer based on the injected ou from the GMM object.Since we did not work with classification but with profile pr tion using an inflexible model, we needed to define an approach for computing model's accuracy.A formula that equates y_pred to y_true at any observation po defined as follows: Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

Loss Legend
Optimal fit Underfitting Overfitting

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is Similarly, if the network data feed has less noise (behavior of polarization), as in PCom-SEIR data feed, we should expect optimal fitting more than underfitting and o fitting.The resulting mapping in Table 5 shows a summary of the MDN's loss pe mance, which indicates the fitting levels for the data feed and various activation funct as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both coun was 100% optimal, regardless of the activation function.(2) The fitting level of the WHO data for both data ranges and both countries was tween 33% and 66%.The 33% optimal fitting occurred when we used the sigm activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Fig 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation f tion produced 100% optimal fitting for both input data feeds.Relu and Tanh h 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the W input data feed.This meant that the algorithm, alongside the statistical profile o input data, played a part in governing the model fitting for the same set of batch and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the o hand, there are simple models that are, to a certain extent, less complex and inflexible and less accurate but more interpretable.Interpretability refers to the degree to wh model allows for human understanding of natural phenomena [45].Additionally, m flexibility indicates a model's capacity to adapt, evolve, and learn from input data.C sequently, flexible models should be used when research aims to predict average va However, when the objective of an investigation is inference, inflexible models are m relevant because they more easily interpret the relationship between the response v bles and the predictor variables in the average profile [45].The model in this work type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depi in Figure 7, and Section 2 describes the two input data (y_true) modes.The design o architecture shown in Figure 8b suggests that there were two reference outputs.The was a transitional output (y_tran) from the hidden layers, which fed the GMM o through the output layers.The second output was the prediction of the synthes COVID-19 infection rate (y_pred) at the concatenation layer based on the injected ou from the GMM object.Since we did not work with classification but with profile pre tion using an inflexible model, we needed to define an approach for computing model's accuracy.A formula that equates y_pred to y_true at any observation poi Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noise (behavior of polarizatio PCom-SEIR data feed, we should expect optimal fitting more than underfitt fitting.The resulting mapping in Table 5 shows a summary of the MDN' mance, which indicates the fitting levels for the data feed and various activat as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and b was 100% optimal, regardless of the activation function.(2) The fitting level of the WHO data for both data ranges and both coun tween 33% and 66%.The 33% optimal fitting occurred when we used activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada a 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid ac tion produced 100% optimal fitting for both input data feeds.Relu an 100% optimal fit for the PCom-SEIR input data feed and nearly zero input data feed.This meant that the algorithm, alongside the statistical input data, played a part in governing the model fitting for the same set and epochs.

Loss Legend
Optimal fit Underfitting Overfitting

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.hand, there are simple models that are, to a certain extent, less complex and and less accurate but more interpretable.Interpretability refers to the degr model allows for human understanding of natural phenomena [45].Additio flexibility indicates a model's capacity to adapt, evolve, and learn from inp sequently, flexible models should be used when research aims to predict av However, when the objective of an investigation is inference, inflexible mod relevant because they more easily interpret the relationship between the re bles and the predictor variables in the average profile [45].The model in t type of inflexible model.
The architecture of the proposed MDN-based computational instrume in Figure 7, and Section 2 describes the two input data (y_true) modes.The architecture shown in Figure 8b suggests that there were two reference outp was a transitional output (y_tran) from the hidden layers, which fed the through the output layers.The second output was the prediction of the COVID-19 infection rate (y_pred) at the concatenation layer based on the in from the GMM object.Since we did not work with classification but with p tion using an inflexible model, we needed to define an approach for co model's accuracy.A formula that equates y_pred to y_true at any observ Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noise (behavior of p PCom-SEIR data feed, we should expect optimal fitting more than fitting.The resulting mapping in Table 5 shows a summary of th mance, which indicates the fitting levels for the data feed and vario as follows: (1) The fitting level of the PCom-SEIR data feed for both data rang was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and b tween 33% and 66%.The 33% optimal fitting occurred when activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for C 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sig tion produced 100% optimal fitting for both input data feeds 100% optimal fit for the PCom-SEIR input data feed and nea input data feed.This meant that the algorithm, alongside the s input data, played a part in governing the model fitting for the and epochs.

Loss Legend
Optimal fit Underfitting Overfittin

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible hand, there are simple models that are, to a certain extent, less comp and less accurate but more interpretable.Interpretability refers to model allows for human understanding of natural phenomena [45 flexibility indicates a model's capacity to adapt, evolve, and learn sequently, flexible models should be used when research aims to p However, when the objective of an investigation is inference, infle relevant because they more easily interpret the relationship betwe bles and the predictor variables in the average profile [45].The m type of inflexible model.
The architecture of the proposed MDN-based computational in Figure 7, and Section 2 describes the two input data (y_true) mo architecture shown in Figure 8b suggests that there were two refer was a transitional output (y_tran) from the hidden layers, which through the output layers.The second output was the predictio COVID-19 infection rate (y_pred) at the concatenation layer based from the GMM object.Since we did not work with classification b tion using an inflexible model, we needed to define an approa model's accuracy.A formula that equates y_pred to y_true at an Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noise (beha PCom-SEIR data feed, we should expect optimal fitting m fitting.The resulting mapping in Table 5 shows a summ mance, which indicates the fitting levels for the data feed a as follows: (1) The fitting level of the PCom-SEIR data feed for both was 100% optimal, regardless of the activation functi (2) The fitting level of the WHO data for both data rang tween 33% and 66%.The 33% optimal fitting occurr activation function.
(3) By observing the performance in Figures 9B(c) and 11 10B(c) and 12B(c) for Saudi Arabia, one can notice th tion produced 100% optimal fitting for both input d 100% optimal fit for the PCom-SEIR input data feed input data feed.This meant that the algorithm, along input data, played a part in governing the model fittin and epochs.

Loss Legend
Optimal fit Underfitting

The MDN Model's Accuracy
Statistical models with complexity are referred to as hand, there are simple models that are, to a certain extent, and less accurate but more interpretable.Interpretability model allows for human understanding of natural pheno flexibility indicates a model's capacity to adapt, evolve, a sequently, flexible models should be used when research However, when the objective of an investigation is inferen relevant because they more easily interpret the relationsh bles and the predictor variables in the average profile [4 type of inflexible model.
The architecture of the proposed MDN-based compu in Figure 7, and Section 2 describes the two input data (y_ architecture shown in Figure 8b suggests that there were was a transitional output (y_tran) from the hidden laye through the output layers.The second output was the COVID-19 infection rate (y_pred) at the concatenation lay from the GMM object.Since we did not work with classifi tion using an inflexible model, we needed to define an model's accuracy.A formula that equates y_pred to y_tr Mathematics 2024, 12, x FOR PEER REVIEW Similarly, if the network data feed has less noi PCom-SEIR data feed, we should expect optimal fitting.The resulting mapping in Table 5 shows mance, which indicates the fitting levels for the d as follows: (1) The fitting level of the PCom-SEIR data feed was 100% optimal, regardless of the activatio (2) The fitting level of the WHO data for both tween 33% and 66%.The 33% optimal fittin activation function.
(3) By observing the performance in Figures 9B  10B(c) and 12B(c) for Saudi Arabia, one can tion produced 100% optimal fitting for both 100% optimal fit for the PCom-SEIR input d input data feed.This meant that the algorith input data, played a part in governing the mo and epochs.

Loss Legend
Optimal fit Underfitt

The MDN Model's Accuracy
Statistical models with complexity are refer hand, there are simple models that are, to a certai and less accurate but more interpretable.Interpr model allows for human understanding of natur flexibility indicates a model's capacity to adapt, sequently, flexible models should be used when However, when the objective of an investigation relevant because they more easily interpret the r bles and the predictor variables in the average p type of inflexible model.
The architecture of the proposed MDN-base in Figure 7, and Section 2 describes the two inpu architecture shown in Figure 8b suggests that the was a transitional output (y_tran) from the hid through the output layers.The second output COVID-19 infection rate (y_pred) at the concaten from the GMM object.Since we did not work wi tion using an inflexible model, we needed to model's accuracy.A formula that equates y_pre Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

Loss Legend
Optimal fit Underfitting Overfitting

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.
(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the Similarly, if the network data feed has less noise (behavior of polarization), as in the PCom-SEIR data feed, we should expect optimal fitting more than underfitting and overfitting.The resulting mapping in Table 5 shows a summary of the MDN's loss performance, which indicates the fitting levels for the data feed and various activation functions as follows: (1) The fitting level of the PCom-SEIR data feed for both data ranges and both countries was 100% optimal, regardless of the activation function.(2) The fitting level of the WHO data for both data ranges and both countries was between 33% and 66%.The 33% optimal fitting occurred when we used the sigmoid activation function.
(3) By observing the performance in Figures 9B(c) and 11B(c) for Canada and in Figures 10B(c) and 12B(c) for Saudi Arabia, one can notice that the sigmoid activation function produced 100% optimal fitting for both input data feeds.Relu and Tanh had a 100% optimal fit for the PCom-SEIR input data feed and nearly zero for the WHO input data feed.This meant that the algorithm, alongside the statistical profile of the input data, played a part in governing the model fitting for the same set of batch sizes and epochs.

Loss Legend
Optimal fit Underfitting Overfitting

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows:

The MDN Model's Accuracy
Statistical models with complexity are referred to as flexible models.On the other hand, there are simple models that are, to a certain extent, less complex and inflexible [45] and less accurate but more interpretable.Interpretability refers to the degree to which a model allows for human understanding of natural phenomena [45].Additionally, model flexibility indicates a model's capacity to adapt, evolve, and learn from input data.Consequently, flexible models should be used when research aims to predict average values.However, when the objective of an investigation is inference, inflexible models are more relevant because they more easily interpret the relationship between the response variables and the predictor variables in the average profile [45].The model in this work is a type of inflexible model.
The architecture of the proposed MDN-based computational instrument is depicted in Figure 7, and Section 2 describes the two input data (y_true) modes.The design of the architecture shown in Figure 8b suggests that there were two reference outputs.The first was a transitional output (y_tran) from the hidden layers, which fed the GMM object through the output layers.The second output was the prediction of the synthesized COVID-19 infection rate (y_pred) at the concatenation layer based on the injected output from the GMM object.Since we did not work with classification but with profile prediction using an inflexible model, we needed to define an approach for computing the model's accuracy.A formula that equates y_pred to y_true at any observation point is defined as follows: y_pred ± m*y_true = y_true (12) where m is the tolerance margin, the − sign is used when y_pred > y_true, and the + sign is used when y_pred < y_true.In our accuracy computation, we used m = 0.0125.The results are shown in Table 5.The accuracy results did not show any functional supremacy for any activation function.This observation was unexpected.
There might be a question of why we have yet to use y_tran to compute the accuracy, because y_tran is a payload of programming data between the MDN model and the GMM.Thus, in reality, y_tran is not a final prediction output.However, we tried using the accu-racy_object = tf.keras.metrics.Accuracy object; then, we used accuracy_object to call update_state, which built the result dictionary object.From its numpy(), we finally obtained the accuracy reading.The accuracy reading was very low-between 5% and 55%-indicating that y_tran was not the final prediction output.

General Remarks
The error performance was extracted from the model history object (see Figure 8b), which was a dictionary object for the loss and val_loss property keys.From the loss performance diagrams in Figures 9B(a,b) and 11B(a,b) and in Figures 10B(a,b) and 12B(a,b) for Canada and Saudi Arabia, respectively, we can conclude that the sigmoid function showed the best performance, followed by the relu activation function.
In Figure 7, we did not include the flattening layer for our sequential COVID-19 data scenarios.In principle, this was unnecessary because flattening the data could cause a loss of sequential information in the computation of sequential data.However, we experimented with it and noticed that the loss performance (loss and val_loss) worsened tenfold.
A question might arise regarding how this new MDN can predict multiple growth peaks in the COVID-19 infection rate.Therefore, we decided to inject a hypothetical stream of COVID-19 data, as shown in Figure 13A.Surprisingly, the MDN with the sigmoid activation function produced the multipeaked I(t) shown in Figure 13B by producing one of the components as a late-peaking variant component.where m is the tolerance margin, the − sign is used when y_pred > y_true, and the + sign is used when y_pred < y_true.In our accuracy computation, we used m = 0.0125.The results are shown in Table 5.The accuracy results did not show any functional supremacy for any activation function.This observation was unexpected.
There might be a question of why we have yet to use y_tran to compute the accuracy, because y_tran is a payload of programming data between the MDN model and the GMM.Thus, in reality, y_tran is not a final prediction output.However, we tried using the accu-racy_object = tf.keras.metrics.Accuracy object; then, we used accuracy_object to call up-date_state, which built the result dictionary object.From its numpy(), we finally obtained the accuracy reading.The accuracy reading was very low-between 5% and 55%-indicating that y_tran was not the final prediction output.

General Remarks
The error performance was extracted from the model history object (see Figure 8b), which was a dictionary object for the loss and val_loss property keys.From the loss performance diagrams in Figures 9B(a,b) and 11B(a,b) and in Figures 10B(a,b) and 12B(a,b) for Canada and Saudi Arabia, respectively, we can conclude that the sigmoid function showed the best performance, followed by the relu activation function.
In Figure 7, we did not include the flattening layer for our sequential COVID-19 data scenarios.In principle, this was unnecessary because flattening the data could cause a loss of sequential information in the computation of sequential data.However, we experimented with it and noticed that the loss performance (loss and val_loss) worsened tenfold.
A question might arise regarding how this new MDN can predict multiple growth peaks in the COVID-19 infection rate.Therefore, we decided to inject a hypothetical stream of COVID-19 data, as shown in Figure 13A.Surprisingly, the MDN with the sigmoid activation function produced the multipeaked I(t) shown in Figure 13B by producing one of the components as a late-peaking variant component.

Figure 1 .
Figure 1.Implementation modes of the computational instrument.

Mathematics 2024 , 25 Figure 5 .
Figure 5.A typical feedforward NN with three feature input layers, one hidden layer, and a single output layer.

Figure 5 .
Figure 5.A typical feedforward NN with three feature input layers, one hidden layer, and a single output layer.

Figure 8 .
Figure 8. Basic computational topology of the layers and class diagram of the MDN. Figure9A ) to compute the individual COVID-19 variant profiles.In this manner, the objective of the MDN of predicting three Gaussian components, with each component corresponding to a COVID-19 variant, was fulfilled.The results are shown in Figures 9 and 11 for Canada and in Figures 10 and 12 for Saudi Arabia.The component distribution curves, which are shown in Figures 9A, 10A, 11A and 12A, indicate that the MDN managed to produce COVID-19 variant I n (t) values as Gaussian distributions with different sets of distribution parameters {µ, σ, α}.As illustrated in these figures, different values of µ and σ imply that each variant candidate in the set {I n (t)} has a different extent of time (lengths in days) and different start/end days, with the different variants' peaks mostly taking place on different days (the unit of time of observation).

Table 4 .
The dictionary of use cases vs. implementation results.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.

Table 5 .
Summary of the MDN's loss performance.