A Systematic Approach to Predict the Behavior of Cough Droplets Using Feedforward Neural Networks Method

: Coronavirus disease 2019 (Covid-19) has been identiﬁed as being transmitted among humans with droplets from breath, cough, and sneezes. Understanding the droplets’ behavior can be critical information to avoid disease transmission, especially while designing a device deals with human air respiratory. Although various studies have provided enormous computational ﬂuid simulations, most cases are too speciﬁc and quite challenging to combine with other similar studies directly. Therefore, this paper proposes a systematic approach to predict the droplet behavior for coughing cases using machine learning. The approach consists of three models, which are droplet generator, mask model, and free droplet model modeled using feedforward neural network (FFNN). The evaluation has shown that the three FFNNs models’ accuracies are relatively high, with R-values of more than 0.990. The model has successfully predicted the evaporation effect on the diameter reduction and the completely evaporated state, which can be considered unlearned cases for machine learning models. The predicted horizontal distance pattern also agrees with the data in the literature. In summary, the proposed approach has demonstrated the capability to predict the diameter pattern according to the experimental or previous work data at various mask face types.


Introduction
The current issue of the COVID-19 pandemic has gained enormous interest to prevent virus transmission. Further understanding of the fluids and particle transport from the respiratory behavior is vital to building countermeasures against the disease. The transport can be studied based on the dynamics of the respiratory droplets. The investigation of droplet behavior can be either computational fluid dynamics or experimental methods. The droplet characteristic can be described in terms of the diameters, temperature, droplet diameter distribution, number of droplets, evaporation times, falling time to the ground, density, contaminant, and others [1][2][3].
Various studies have tried to discover the droplet dynamic related-transport mechanism, such as the droplet generation's factors and how the droplet dynamics after coming out of the mouth. The human as the droplet source is one of the main factors affecting the droplet group characteristics, including distribution, mass flow, and droplet number [1]. Human activity, including breathing [4][5][6][7], coughing [1,[8][9][10][11], sneezing [12], mouth open or not [1], is one of the obvious factors affecting the droplet group characteristics. Mask efficiency can also affect droplet transportation [2,3] depending on the materials or the filtration capability. According to the experiment by Verma et al. [3], the mask can affect the flow jet direction, the velocity outflow, and the released droplets. While those factors affect the initial droplet groups' characteristics, the environment can affect the droplet dynamics after coming out of the mouth, including the horizontal distance penetration, evaporation rate, and the falling time [10,13]. The previous investigation's usual environment variables are humidity, wind velocity, temperature, and ventilation [13][14][15][16].
Many simulation and experimental works have tried to discover the mechanism behind the droplet behavior at various conditions. The correlation between the affecting factors and the droplet group properties is usually discovered or calculated using the numerical method [17][18][19][20][21]. Each work is usually carried out for simulating a specific case. As widely known, numerical simulations are powerful methods to discover a complex system mechanism. The computational burden depends on simulation complexity. There is a high number of experimental data and simulation results available in literature after going through arduous effort and time-consuming processes. However, if new variables are needed to be considered, more time-consuming simulations need to be carried out. Empirical models should be potential methods to gain more benefit by predicting the droplet behavior straightforwardly based on the available data based on the previous research. In other words, an approach that is possibly capable of accommodating all the available measured data and simulation results is needed, especially to apply in practical cases or fast prediction of the droplet behavior. For example, the model can be employed to directly predict droplet behavior based on the database in literature without re-run a complex simulation using a numerical method.
However, the effort to develop empirical models that can flexibly accommodate various variables and correlation can be considered rare. One of the works [22] have carried out experiments about the flow rate, flow direction, and air velocity determined from the flow rate and mouth opening area and develop an empirical model. The model inputs are the height, weight, and gender of a person. The model focuses more on the human factor, including age, gender, and height, with the produced droplets. Another simplified model related to the respiratory droplet is in [23] using the Maxey Riley equation that has successfully analyzed droplet behavior, especially in horizontal penetration. This model has limitations only for a specific case hence limited variables coverage.
Machine learning is one possible solution that can accommodate various empirical data and numerical simulation results to produce more meaningful insight. Machine learning, including extreme learning machine (ELM), artificial neural networks (ANN) have gained enormous attention because its capability to predict with high accuracy and recognize the pattern of a complex system. Machine learning consists of training algorithms and topology. Algorithm is a method to training the topology, such as backpropagation (BP), ELM, and deep learning (DL). The topology is about the model structure that will be trained using the selected algorithm. Feedforward neural network (FFNN) is the most popular topology employed in various cases and can model almost all complex cases [24,25]. Therefore, this paper proposes a novel framework to predict droplet behavior using a machine learning method based on empirical data and numerical results. The proposed model is developed based on FFNN built by an extreme learning machine (ELM) algorithm. The current work is limited to only cough cases and zero wind conditions, with the assumption of a free fall droplet. Firstly, the methodology is proposed, including the proposed method's general concept, the model developed for each part of the proposed method, and the respective parameter set up. Then, the method's capability to predict droplet behavior is evaluated and discussed. Figure 1 shows the three main parts of the proposed approach, which are the droplet generator, mask models, and droplet behavior in the environment. The droplet generator consists of algorithms to generate a set of data representing the droplet. Mask models consist of the efficiency of the prediction and to simulate the trapping process of the droplets. The part of the droplets in the environment means to predict horizontal distance and evaporation rates. The machine learning algorithm employed to build FFNN is Extreme learning machine (ELM). ELM has known for its fast training time and comparable accuracy compared to other popular methods, such as support vector regression (SVR) and backpropagation (BP) [26,27]. ELM builds a model without iteration by assigning random values to the weighting of input and then calculating the output weighting by applying pseudoinverse [28]. The data for modeling is divided into only two groups, which are training and testing. While the training data is for building the model, the testing data is a group of distinct data to prove that the model can work well. The model in the training process should not learn testing data. All data in this manuscript is divided into 80% of training and 20% of testing data. The normalization is applied by employing min-max linear normalization.

Droplet Generator
The mean flow rate for coughing for the males between 20-30 is 0.48 L/s, with standard deviations is 0.09 m/s. The current work focuses on the model formulated based on the measured droplet while coughing with mouth open. The data is shown in Figure 2 from [1]. The droplet is then generated based on the designated range according to the data. The further variation will be developed in the future to cover broader human factors and more activities. The data can be found in various literature, such as gender, height, weight [13], and age [11]. The activity can also affect the diameter distribution, such as coughing [1,11], breathing [1,4], and sneezing [1,12]. The matrix of the generated droplet can be represented in Equation (1), where M is the number of droplets, from the diameter value for the first droplet d 1 to the M-th droplet d M . According to the literature [1], the number of droplet is 3000.

Mask Model
The masked model is built using an FFNN method based on the data in [29] to predict the efficiency. The employed model has two parts, which are the efficiency model and determination of the trapped droplet. The efficiency model is built based on FFNN inspired by the biological nervous system. The FFNN model is known to replicate a system behavior, especially highly nonlinear models. A critical step of the FFNN model development is the determination of input and output. For the efficiency model, the inputs are diameter d and type of the mask (k), and the output is the efficiency η represented by the function shown in Equation (2 The function is an FFNN model with one input, one hidden node, and one output layer. The function is also called a multi-layer perceptron with the three layers, as mentioned before. The training model is trained using an extreme learning machine (ELM) to predict with acceptable generalization [24,25]. Another method can also be employed to train the FFNN model, such as Levenberg Marquardt [30].
The training data is a set of mask efficiency as a function of diameter at various mask types from [29], as shown in Figure 3. The accommodated mask types are cotton, gauze, N95, procedure, and surgical masks. The numbering is started from 0 representing no mask condition to 5 representing the N95 mask. The data is divided into two groups, which are training consisting of 80% data and testing consisting of 20% data. The training data is employed for developing the model. Testing data is utilized to predict the obtained model performance or outside of the training process.
As described in the earlier paragraph, the training method is an ELM [31] using the basic method. The employed activation function is a hard limit, as it has shown a good agreement in various cases [32,33]. The hidden node number is varied from 10 to 10,000, as the model has shown an acceptable computational time and agreeable accuracy for other cases [31,33].
The determination of whether the mask traps a specific droplet or not is by comparing the efficiency value of a randomly generated using a uniform distributed function. If the random value is less than the predicted efficiency, the coupled droplet is classified as trapped by the masker. Otherwise, the droplet is then passed to the next stage. With an M sample or droplet number, a set of numbers with the size of M × 1 is generated using a uniformly distributed random function with a built-in-algorithm of Matlab. The array is then coupled with the obtained efficiency, and the value is compared with the predicted efficiency. If the droplet diameter has a diameter more than the training data's maximum value, the droplet is considered trapped except for the no mask condition. This model can be further developed in the future to cover the velocity direction and magnitude, and mass flow of the droplet according to the experimental data [3] or simulation data. The mask model limitation depends on the available data related to mask efficiency, which is about 12 µm. If the droplet diameter is more than 12 µm, the maximum efficiency η high according to the mask type (see Table 1) is applied. The pseudo-code for the determination of the efficiency is shown below.
if diameter<= max_range efficiency=mask_model(diameter) else % if the diameter is more than the max_range efficiency=eff_max end If the first part of the model generates the model ranging from 1 µm to 2 mm, the droplet with a diameter of more than 12 µm is determined to be trapped or not based on the maximum efficiency η high in Table 1.

Environment Models
The droplets come along the respiratory jet and eventually leave the jet and dropped to the ground or evaporated, as shown in Figure 4. The total evaporation time and the falling time to the ground with a 2 m distance of the droplet can be depicted in Figure 5. The smaller droplet evaporates, and the giant droplet will drop to the ground. The evaporation rate, evaporation time, and the time to fall to the ground is a function of various variables, such as humidity and wind velocity [29]. Another important information is the penetration or the horizontal distance to know how far the diameter can travel that can be derived to consider safe distance in human to human interaction [22,[34][35][36][37]. The formulation of the model is divided into two cases, which are the model to predict the droplet diameters reduction and the prediction of the maximum horizontal distance.  The first model is formulated to predict the diameter as a function of time and the droplet's initial size. Therefore a data based on numerical simulation in [37] is employed as the training data. The plot of the training data is depicted in Figure 6. The diameter is gradually decreased because the evaporation process becomes the airborne or droplet nuclei. The modeling method is an ELM (ELM) by employing an FFNN with a single hidden layer. The mathematical representation of the model is shown in Equation (3), where t is the time after coming out of the mouth, d 0 and d t are the initial and later state of the droplet diameter, respectively. The FFNN employs the 500 hidden node number and sigmoid activation function.
The second model is developed to predict penetration ability as a function of droplet diameter. An FFNN with ten hidden node numbers and sigmoid activation is employed. The FFNN is trained using the ELM method. The data for the training data is taken from [37].

Mask Model Evaluation
The discussion consists of the model accuracy performance and the physical meaning of the predicted droplets. The simulation is firstly carried out by comparing various FFNN configurations. The 4000 hidden is selected because it has better accuracy than the fewer hidden node number and almost the same in terms of root mean square error (RMSE) with the higher hidden node number. The computational time is also quite acceptable, with training time less than 1 s while being simulated in Matlab installed in a computer with AMD processor Ryzen 3 and RAM 8 GB. After running the simulation, the results has shown a good agreement with the RMSE value of 0.0022 for the training case and 0.0042 for testing. The prediction of the efficiency can be checked visually in Figure 7. The R 2 of the training and testing data of the mask, the model is more than 0.997 and 0.989, respectively, which is considered high for correlation, as shown in Table 2.   Table 3 describes the droplet number after coming out of the mask, the percentage of the outlet volume compared to the initial volume, and the escaped droplet's diameters average. N95 has the highest capability as it traps the most number of droplets compared to others, with gauze as the least. The data is highly dependent on the training data. If the droplet diameter more than the maximum value of the training data, the efficiency values in Table 1 are employed depending on the mask type. The droplet number comparison of the procedure, surgical and N95 mask is almost the same. In reality, the N95 mask should have more efficiency compared to the surgical mask. The cause can be the reference for the cough droplet data distribution [1], where the small diameter droplet number with a size of less than 10 µm is quite few. An addition simulation is added to check the mask accuracy further by adding the droplet number up to 100 for the range between 0 and 10 µm, the comparison is shown in the form of box plot for 50 times simulation as shown in Figure 8. The difference between N95 and surgical mask becomes more apparent. From the results, it can be said that the proposed mask model can predict the escaped droplets. The comparison between the surgical and N95 mask can be further compared in the future, therefore it can be almost the same as in [3], especially in terms of the trapped droplet number. In the future, the mask training data will be updated according to the latest and more accurate sources.

The Droplet Evaporation and Penetration
As discussed in the methodology, the model consists of two parts, the reduced diameter of the droplet and the penetration capability. Both models were modeled using FFNN trained by ELM algorithms. The predicted reduced diameters were compared with the data from [37] or data in Figure 5. As shown in Figure 9, the predicted data has shown a good agreement with the training data. Although some slight errors have been found in the small value of diameter data, the overall regression analysis between the predicted droplet diameter and the training and testing data's diameter has shown a high correlation with 0.999 and 0.994 of R 2 , respectively. The model represented by Figure 9 is to predict the droplet diameter reduction behavior after escaping from mouth or mask. Figure 9b,c shows the comparison of the predicted diameter at a certain timestamp. The model range is actually up to 1500 µm to accommodate the no mask condition. Furthermore, for the droplets with diameter less than 10 µm, the evaporation process will be very fast about less than 1 s and will not move too faraway from the initial position (except there is wind from external environment). Therefore, while observing the horizontal distance, the error can be ignored. However, to reduce the indicated error of the small particles in Figure 9c, the proposed methods will be refined in the future by accommodating more comprehensive training data or by applying another machine learning method. Figure 9. The droplet diameter reduction prediction compared to the reference data (a) as a function of time series, (b) the correlation between the predicted and reference data for all diameter data, and (c) for data with initial diameter less than 200 µm.
The first and second data of Figure 10 shows reduced diameter and time as a function of the diameters. The y-axis is the predicted diameter at a specific timestamp from the first figure's FFNN model. As described in Section 2.3, the model's input is the time stamp and the initial diameter. The second figure is the input to obtain the y-axis of the first figure. The input data from [36] represents the falling time and evaporation time for various diameters. In other words, there are two regions. The first figure shows that the boundary's right side has a predicted diameter with negative values. In other words, the volume droplet is zero, or it has evaporated completely. Figure 10 also confirms that the model has shown its capability to predict the droplet behavior outside of the training data range or unlearned data. The training data only consist of the reduced diameter up to the evaporation time. In contrast, the data for what happens after the designated time is unknown from the model point of view. After running the simulation, the result has shown the negative value that can be interpreted to be evaporated completely. In summary, the model's capability to predict the evaporation condition and the final state of the diameter after a specific timestamp has been demonstrated.  Table 4 shows the evaporated droplets at specific timestamps, which are 2 and 10 s. Besides N95 with no escaped droplet, the droplets from other mask types have shown a considerable number of the evaporation process after a duration of 2 s flowing to the ambient air. After 10 s, most of the droplets have evaporated while the diameter of the rest continuously decreases. For example, the remaining droplet number of the cotton mask is about 94, with the average diameter is 197.56 µm which should be less than the initial diameter. Furthermore, the other remaining droplets would be evaporated completely or falling to the ground after a certain period, which one comes first. The analysis of the obtain data from this part of the model can be combined with the data from the next model that predicts horizontal distance coverage.  The horizontal distance predictor or the second model has been compared with the training data from [37], as depicted in Figure 11a as the function of diameter and b regression analysis. The model has a relatively high correlation value for training and testing, which are 0.990 and 0.933, respectively, as shown in Table 2. The model also can follow the training data pattern by showing the furthest distance of the droplet is covered by droplets with a diameter of about 30 µm. If the input droplet diameter value is more than the training data's coverage, the highest value of diameter input is considered, which is the same as in the mask model. The horizontal distance coverage is affected by various variables, such as humidity, the initial velocity of the droplet, wind velocity from ambient or ventilation that can be considered in the future.  Table 5 shows the predicted droplet number at various ranges of horizontal distance. N95 and surgical mask show zero value of droplets because the escaped droplets have a relatively small size and will be evaporated at a relatively short distance. The highest droplet number is found at no mask condition, followed by gauze, cotton, and procedure. The droplet number at a horizontal distance of less than 50 cm is not shown in the model because the number is only minority and the droplets will be completely evaporated because of the small diameter. From the table, N95 and the surgical mask has the safest condition because of the minimum values of droplet number at a distance of more than 50 cm. From the droplet generators' point of view, if the droplet number for small size diameter is added in the cough droplet distribution, the higher number escaped droplets from the mask will be likely evaporated before reaching a distance of more than 50 cm because of the small size of the droplet. For the procedure mask, 5 droplets are detected at a distance of more than 100 cm, significantly lower than cotton and gauze masks. When considering the no-mask condition, the droplet number majority is found between 100 and 140 cm. The droplet can be 3 times lower than the no mask condition by putting the gauze mask on. The droplet population can be further reduced by half of the original by putting on the cotton mask. In other words, although the gauze mask has significantly low effectiveness compared to the surgical mask and N95 mask, the gauze mask still can considerably reduce the droplet volume three times than the no mask condition. In summary, the proposed model has successfully demonstrated the capability to predict the droplet behavior at various mask types. The simulation time is also relatively fast. While the training time is less than one second, the prediction time can be faster. The duration to complete 50 times simulation involving 3000 droplets at 6 different mask conditions for the three models is about 312 s which is relatively fast. The manuscript's main novelty is about a framework to predict droplet behavior based on various literature's available data. Therefore, the method can be extended into some possible directions. Firstly, the model can be developed to accommodate the most updated training data due to the quality of a machine learning-based model depending on the quality of the data. Therefore, if a new finding appears in the future, the research can be accommodated by the model by including the training data's data. Secondly, the model can be developed further by accommodating more independent variables. For the droplet generator, another droplet distribution data at different human activities can be added in the future. For the mask models, the droplet's velocity can be added to the model input by accommodating the change velocity or reduced diameter before and after various mask types. For environmental models, the horizontal velocity, humidity, and wind velocity of the falling droplets can be added to the model by adding the training data according to the desired variables.
The proposed methods have potential functions as the fast prediction, interconnecting one research to another, gaining insight into unknown phenomena by working together with the numeric model. The model can duplicate the experimental or numerical data pattern; hence fast prediction of output or dependent variable with different input values or independent variables is possible without redo the simulation or experiment. Therefore, when predicting a situation or variable at a particular condition, the prediction can be carried in a short time. The model can also be employed to integrate one research to another, as demonstrated in the current paper, where the knowledge of two different research is integrated to predict the droplet flow from outlet mouth to the environment at various mask type conditions. Furthermore, the model can be employed to gain insight into an unknown phenomenon, mainly when reinforcement learning can be applied in the future.

Conclusions
A systematic approach to predict the final state droplet diameter and horizontal distance coverage is proposed. The approach can be divided into three parts, which are droplet generator, mask model, and environmental models. The droplet generator is developed based on measured data from coughing activities with an open mouth. The masked model is built based on an FFNN model trained by an ELM method. The environmental model has two models, predicting the final state droplet diameter and horizontal distance coverage. Both systems are modeled by FFNN trained by an ELM algorithm. The evaluation has shown promising results. The accuracy of the three machine learning models is relatively high, which R-value is more than 0.9900. The evaporation condition can also be predicted successfully despite the condition not being included in the training range. The model can also predict the penetration time and has been validated with the previous work.
In summary, the model's application can be a fast evaluation of a prototype and consideration for the social distancing with various possible conditions. The model can also combine seemingly different scope studies while waiting for a valid experimental study or waiting for numerical solution solutions. The model needs to be further improved in the future, mostly to cover more data or cases such as more human activities and environmental conditions.