Event-Specific Transmission Forecasting of SARS-CoV-2 in a Mixed-Mode Ventilated Office Room Using an ANN

The emerging novel variants and re-merging old variants of SARS-CoV-2 make it critical to study the transmission probability in mixed-mode ventilated office environments. Artificial neural network (ANN) and curve fitting (CF) models were created to forecast the R-Event. The R-Event is defined as the anticipated number of new infections that develop in particular events occurring over the course of time in any defined space. In the spring and summer of 2022, real-time data for an office environment were collected in India in a mixed-mode ventilated office space in a composite climate. The performances of the proposed CF and ANN models were compared with respect to traditional statistical indicators, such as the correlation coefficient, RMSE, MAE, MAPE, NS index, and a20-index, in order to determine the merit of the two approaches. Thirteen input features, namely the indoor temperature (TIn), indoor relative humidity (RHIn), area of opening (AO), number of occupants (O), area per person (AP), volume per person (VP), CO2 concentration (CO2), air quality index (AQI), outer wind speed (WS), outdoor temperature (TOut), outdoor humidity (RHOut), fan air speed (FS), and air conditioning (AC), were selected to forecast the R-Event as the target. The main objective was to determine the relationship between the CO2 level and R-Event, ultimately producing a model for forecasting infections in office building environments. The correlation coefficients for the CF and ANN models in this case study were 0.7439 and 0.9999, respectively. This demonstrates that the ANN model is more accurate in R-Event prediction than the curve fitting model. The results show that the proposed ANN model is reliable and significantly accurate in forecasting the R-Event values for mixed-mode ventilated offices.


Introduction
Since prehistoric times, infectious illnesses have impacted several civilizations. There have been several devastating viral outbreaks throughout the past century. Severe acute respiratory syndrome coronavirus" (SARS-CoV) (2002), influenza A virus subtype H1N1 (A/H1N1) (2009), Middle East respiratory syndrome coronavirus (MERS-CoV) (2012), Ebola virus (2013), and severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) are a few of the deadliest viral pandemics that have had widespread effects in the past 20 years (2002-present) [1]. Millions of people have lost their lives as a result of these pandemics in the last two decades, and many more are currently suffering. Many more viral outbreaks forces and the use of mechanical equipment such as fans, air conditioners, and coolers produce airflow through the building with MM ventilation. The buoyant forces produced by temperature fluctuations inside the building, wind from the outside, and the fan speed inside the building, as well as all other operating ventilation and air conditioning equipment are necessary for MM ventilation. Buildings may be designed and operated to benefit from the interactions of these natural forces with the aid of mechanical equipment. According to a previous study [23], the period of aerosol dispersion is multiplied ten-fold in a room with poor ventilation.
The indoor ventilation and carbon dioxide (CO 2 ) concentration have a particular connection in a steady-state situation [24]. As a result, the indoor concentration of CO 2 might be used as a proxy indicator if an appropriate instrument to monitor indoor ventilation is unavailable. The interior CO 2 levels indicate the ventilation effectiveness of a building in connection to air changes, mechanical devices, air flow patterns (depending on internal and external environments, human activities indoors), opening areas, and the occupant density, as well as the spatial volume of occupied space [25]. Traditional CO 2 monitoring methods do not capture spatiotemporal fluctuations in CO 2 concentration levels. Recent technical advancements in CO 2 monitoring and prediction have allowed for reliable predictions of CO 2 concentration levels within buildings. These advancements further provide information on individuals' CO 2 exposure. Furthermore, the number of accurate air changes per hour (ACH) is difficult to discover in an actual office environment, since the circumstances in mixed-mode ventilated building spaces are very dynamic. The CO 2 concentration in standard indoor environments must be maintained below 1000 ppm. This should be prioritized for improvement if the concentration exceeds 1500 ppm or above. As per the SAGE-EMG recommendations for COVID-19 circumstances [26], the CO 2 levels in buildings where a lot of aerosols are created should be kept below 800 ppm. CO 2 might be utilized as a surrogate to identify the spread of infection caused by SC-2, since both the COVID-19 spread and CO 2 concentration are affected by the occupant density in any enclosed environment.
Based on the carbon dioxide content, Rudnick and Milton [27] assessed the chances in viral transmission through air indoors. They created a model that uses the CO 2 content as a gauge for respired breath exposure, as well as determining how much of the air being breathed has previously been expelled by a dweller within the structure. The scientists developed a CO 2 -based risk model without considering the outer air supply rate or assuming that it remains constant over time. They also avoided assuming that the concentration of an infectious agent had reached the steady state. Likewise, Peng and Jimenez [28] suggested evaluating the CO 2 levels as a SC-2 infection risk proxy for various interior locations and activities in 2021. In a range of typical indoor environments, the authors developed analytical formulas for CO 2 -based risk proxies. The authors cited a restriction whereby the estimations of infection risk, which are mostly dependent on viral exhalation rates, have significant uncertainties. Nonetheless, they advised installing low-cost risk monitoring systems based on CO 2 sensors to check for indoor infection in order to advance public safety and health. To reduce COVID-19 transmission, Bazant and Bush [29] advised limiting the amount of time spent in mutual places with an infected person. Later, Bazant et al. [30] recast the safety advice in terms of the mean exhaled CO 2 concentration and occupancy time in an indoor setting, permitting the use of CO 2 sensors in the risk evaluation of aerial viral transmission. Based on monitored CO 2 data, the authors created a mathematical model to estimate the likelihood of aerial transmission. The authors came to the firm conclusion that while CO 2 levels are regarded as a direct signal for ventilation and air mixing, they may also be utilized to evaluate the risk of SC-2 viral aerial transmission. A review on the aerial transmission of respiratory viruses was provided by Wang et al. [31]. By serving as indicators of the buildup of exhaled air, CO 2 sensors may be used to track ventilation and enhance it. It has been suggested that CO 2 levels be kept between 700 and 800 ppm, however the method of ventilation should also be taken into account. "ArchABM" is a brand-new agent-based simulator that Martinez et al. [32] created. By estimating ventilation characteristics and adequate room sizes, as well as evaluating the impacts of policies while accounting for IAQ as a consequence of complex building-human interaction patterns, the system is made to help with either the creation of brand new structures or the modification of already existing ones. In this study, an aerosol model that had already been published was updated to calculate the time-dependent CO 2 and viral quanta concentrations in each room and the inhaled CO 2 and viral quanta concentrations for every dweller for a day as a gauge of the physiological response. Recently, Kapoor et al. [33,34] published a couple of studies to predict AI-based event-specific airborne viral transmission in a naturally ventilated office room. In [33], the authors connected IEQ parameters (CO 2 levels, temperature, and humidity) with the occupancy rate, occupant behavior of door and window opening, outdoor pollution level, air conditioning with an operable fan, and event-specific viral transmission probability via ANN and curve fitting methods. In another study [34], the abovementioned parameters were connected using various other machine learning (ML) methods. This study advances existing studies by incorporating AI to forecast the aerial transmission by means of data on the indoor CO 2 concentration, occupancy rate, occupant behavior, and other indoor and outdoor environmental factors that are acquired in real-time mixed-mode ventilated office environments.
The temperature and humidity are also crucial indoor environmental factors that influence viral survival. The temperature and humidity both have a significant impact on indoor human comfort and have a direct impact on how people feel. Mecenas et al. [35] conducted a review of the literature on the impacts of temperature and humidity on COVID-19 transmission and concluded that future research should take these factors into consideration because they may have an impact on the disease's spread. The relative humidity and temperature were also identified as essential factors in a number of international guidelines. Many of these guidelines suggested boundary conditions for indoor temperature as well as relative humidity, which took into account both indoor human comfort and the transmission of SC-2. Figure 1 represents the boundary conditions recommended by several guidelines for the indoor air temperature and relative humidity to prevent the transmission of SC-2 while maintaining comfort [20,22,[36][37][38][39][40][41]. The common safe-comfortable temperature range among the mentioned guidelines is from 24 • C to 26 • C and the safe-comfortable indoor relative humidity range is from 40% to 60%. Other crucial factors for minimizing SC-2 transmission in a closed building include the occupant behavior, outside environmental conditions, and indoor occupant density [42,43]. The guidelines on COVID-19-appropriate behavior in office contexts have been released by a number of organizations and governments from across the world [44,45]. Other crucial factors for minimizing SC-2 transmission in a closed building include the occupant behavior, outside environmental conditions, and indoor occupant density [42,43]. The guidelines on COVID-19-appropriate behavior in office contexts have been released by a number of organizations and governments from across the world [44,45]. Some of the key suggestions made to employers for a safe workplace were maintaining reduced occupancy, good hygiene, sufficient aeration, appropriate safety measures, timely training schedules, accurate information flows, quick responses to any kind of illness or symptom, and worker health monitoring.
The chance of SC-2 spreading in an office setting can be significantly reduced by using an appropriate door and window opening strategy in accordance with spatiotemporal variation and occupant activity behavior patterns, as well as comfort. Almost all of the recommendations call for opening windows and doors to improve ventilation and reduce stale air generated within by exhalation. Opening windows or doors, however, is not always a smart option. Particularly in situations where there is unwanted excessive light, high levels of outside pollution entering the building (based on AQI measurements), visual privacy concerns, fly and insect invasion risks, uncomfortable air movements or drafts, or excessive outside noise, which can affect user concentration an in turn increase errors, leading to productivity losses. However, using mechanical devices combined with altered ratios of door and window opening operations can solve these problems up to a certain level. MM ventilation can be an effective solution in the abovementioned situations against viral transmission problems. Ceiling fans, exhaust fans, air conditioners, coolers, and table fans can be used effectively to ventilate the indoor spaces containing operable doors and windows. The additional mechanical components merged with natural ventilation help in reducing the viral transmission if used properly.
The most significant components of the air quality index, or AQI, are the PM 10 and PM 2.5 , which indicate the levels of pollutants [46]. Several researchers have tried to find the relation between the particulate matter in buildings and SC-2 transmission [47][48][49][50][51]. Since the size of the virus is very small, it can easily settle on dust and particulate matter and then be transmitted by the resuspension of those particles inside closed buildings due to human activities or natural forces. As the gravitational force is negligible on the dry nucleus form of virus, the SC-2 virus can survive longer in the air and on surfaces; this makes it a dangerous viral form. The chances of inhaling the free floating viral load in the air is much higher in closed environments with higher occupancy rates. In addition to this, the minuscule pathogens can penetrate deeply in our bodies and create severe problems. Most of the features presented above are correlated, with some having strong relationships while others have weaker linkages. The diurnal trends and associated variations among indoor environmental parameters have been studied by researchers globally [52][53][54].
In connection to SC-2, Tupper et al. [55] developed the idea of "Event-R". The Event-R rate is the "expected number of new infections due to the presence of a single infectious individual at an event", according to Tupper et al. The authors discovered a basic association between the "Event-R" rate and four different characteristics, namely the spread intensity, degree of mixing, individual proximity, and exposure time. The R-Event is another term for the Event-R, and either can be used in real time. In addition to the infection probability, the R-Event was also employed by REHVA in their revised (Version 2.1) airborne COVID-19 prediction tool [56].
The precise infection probability is now being determined by scientific research on the antecedent transmission of several SC-2 virus strains in indoor environments around the world. In bounded spaces, the scientific community examines two distinct sorts of infection probability: (i) based on the CO 2 exhaled by the enclosed space occupiers and (ii) based on the air change rate. The first type is coupled with on-site dynamic spatiotemporal interior and outside environmental data to forecast the R-Event. The R-Event is a more advanced and informative method for infection probability. The R-Event value is the multiplication of the infection probability by the ratio of susceptible individuals to the infected person; alternative values of the R-Event can be provided for different occupancy rates, whereas the infection probability remains the same regardless of the occupancy rate. The REHVA calculator (v2.1) is used in determining the R-Event under mixed-mode settings and recording it synchronously with other real-time information to create a relationship between indoor CO 2 levels and R-Event values. To determine the likelihood of airborne transmission, an unknown solitary contaminated person (among four static individuals inside the case office room) is taken as an infection source (through aerosols only). The purpose of this study is to assist readers in reliably predicting the R-Event value via indoor CO 2 levels in an office area. In addition to CO 2 , the interior and exterior environmental characteristics, occupancy rate, and occupant information are essential for the prediction results.
If an SC-2-contaminated sick occupant is occupying a closed indoor space with other susceptible occupants and all are inhaling and exhaling simultaneously, then high occupant density and lower ventilation rates can lead to an accumulation of CO 2 with freely suspended SC-2 virus (in air) particles that are liable for the aerial transmission of SC-2. To forecast the probability of infection using objective and subjective data, many numerical models have been established, and there have been several simulation studies as well. As noted in the introduction, only a few scientists have tried to connect the dots between indoor CO 2 concentration and SC-2 transmission rates. A model based on sophisticated computing techniques such as artificial intelligence (AI) has yet to be presented in a realtime research study.
There is less research on the relationship between indoor CO 2 levels and viral transmission [27][28][29][30][31][32][33][34]. However, as CO 2 may be utilized scientifically as a proxy for indoor viral transmission, similarly the indoor CO 2 concentration can be used to predict the probable number of infected people as well. In practical dynamic environments such as workplaces and classrooms, the CO 2 levels are not linked to the amount of infectious pathogens present in the air.
In this study, as a proxy for the chance of SC-2 transmission, the R-Event value was predicted using a supervised ML-based ANN model, as well as a curve fitting (CF) model. Another advantage of this study was the development of the CO 2 concentration database and investigation of the occupants' exposure to indoor CO 2 by mapping the office staff members' (subjects) activities within the MM ventilated office room and combining them with the R-Event. The three research objectives of this work were as follows:

1.
Gathering real-time spatiotemporal data for both subjective (occupancy-related) and objective (environmental) variables in office environment to use machine learning (ML) techniques to create links among the parameters; 2.
Creating a unique relationship between the R-Event (event-specific infection probability) value and CO 2 concentration in mixed-mode ventilated office environments; 3.
Comparing novel CF and ANN models (developed for a mixed-mode office environment) for the prediction of R-Event values.
The rest of this study is organized as follows. Section 2 explains the novelty and objectives of the study. Section 3 deals with the materials and methods used in this research, including the collection, standardization, and filtration of the data. Section 4 defines the CF method and ANN technique in detail. Section 5 discusses the results of the ANN models and the CF model with the ANN formulation. The last section, Section 6, concludes the work done in this study with the limitations of this research and potential scope for future work.

Materials and Methods
CO 2 was continually monitored in an office using an EXTECH Indoor Air Quality Meter/Data-Logger Model EA80. The EA80 instrument measures CO 2 concentrations using a maintenance-free CO 2 sensor with a 0 to 6000 ppm range. The equipment was maintained away from the static occupants because CO 2 being directly emitted over the instrument degrades the measurement accuracy. The EXTECH Heat Stress WBGT Meter Model HT200 was used to detect the temperature and humidity. Both devices were installed at 0.8 m above the tile floor. Tables 1 and 2 give the technical specifications for both devices. Both factory-calibrated instruments were examined before usage. Alternative equipment models with higher accuracy are also available for the same purpose; however, owing to our financial restrictions, these models were utilized throughout this investigation. The AQI was recorded from [46]. The calculations were carried out on a desktop computer using MATLAB R2021a.

Veracity Factor Air Temperature Relative Humidity
Range The prediction model was built using 2207 datasets out of a total of 2640 datasets. This study included measurements of the indoor temperature (T In ), indoor relative humidity (RH In ), area of opening (A O ), number of occupants (O), area per person (A P ), volume per person (V P ), CO 2 concentration (CO 2 ), air quality index (AQI), outer wind speed (W S ), outdoor temperature (T Out ), outdoor humidity (RH Out ), fan air speed (F S ), air conditioning (AC), and one target variable, the Event-R (R-Event). The datasets were chosen based on the likelihood of having at least two occupants present in the workplace. This study did not take into account single-occupancy datasets. Furthermore, datasets in which no mechanical device was engaged were not considered. As a result, 433 datasets were removed from the total collected datasets for mixed-mode ventilated office environments.
The occupancy rate of the office space has the most impact on the CO 2 concentration, since the occupants are the major source of emissions. The output R-Event is the anticipated number of new infections that arise in any event occurring during a time period "T" in any enclosed environment. Equation (1) presents the mathematical form of the R-Event [55]. However, the R-Event is calculated mathematically using the SC-2 infection probability calculator in REHVA version 2.1. Equation (2) is the REHVA-calculator-based mathematical formulation for the R-Event [56]. The surroundings of the MM ventilated office building have an impact on the conditions inside as well. Office buildings close to busy roads, markets, and densely populated areas are frequently seen to be affected by environmental factors. As a result, this study also takes external environmental variables into account. "T" represents the length of the entire event; "t" represents the length of time that a solitary susceptible occupant made contact with an infected occupant; "β" denotes the probability of transmission, which is constant with respect to time; "1 − e −βt " denotes the likelihood that any susceptible occupant will become infected; "k" denotes the susceptible occupants (total) who made contact with the infectious person, which may be expressed simply as Equation (2): The research approach used for this investigation is depicted in Figure 2. The compilation of a database is the first step in developing a unique ANN-based model to forecast SC-2 transmission in a MM ventilated office setting using exhaled CO 2 as a proxy indicator. The database preparation process consists of three major steps: (a) data collection; (b) data normalization; (c) data splitting for training, testing, and validation.

Data Collection
The data used in this study was collected in an MM ventilated office in India, utilizing a CO2 measurement instrument. In the year 2022, the measurements were made in the months of March, April, and May. The test site coordinates are NL 29°51′54″ and EL 77°54′10″. The office space has a floor area of 24 m 2 and it has an 84 m 3 volume. About 40% of the floor surface was taken up by furniture such as tables, chairs, and storage cabinets. The office is on the ground level and has a large window with an aluminum and glass frame that is 1.5 m in height and 2.5 m wide on the south wall. The window is split into three sections: the center section is fixed, while the sections at each end are operable and regarded as movable elements of the office. The main door and a side door of the office space are both made of wood. The front door, which measures 1.2 m × 2.4 m and leads to the passageway, is located on the north wall, right in the front of the window. The coupled ventilation above the main entrance door measures 1.2 m × 0.5 m in size. The office room's east side wall contains the side door, which is 0.9 m × 2.0 m in size, and is regarded as a movable element of the office. A double door is used for the front door, while a single door is used for the side door. The room contains one ceiling fan. One window air conditioner is fitted under the fixed window pane at the height of 0.3 m from the ground floor. Table 3 outlines the additional room's components in particular that are not mentioned in [33].

Data Collection
The data used in this study was collected in an MM ventilated office in India, utilizing a CO 2 measurement instrument. In the year 2022, the measurements were made in the months of March, April, and May. The test site coordinates are NL 29 • 51 54 and EL 77 • 54 10 . The office space has a floor area of 24 m 2 and it has an 84 m 3 volume. About 40% of the floor surface was taken up by furniture such as tables, chairs, and storage cabinets. The office is on the ground level and has a large window with an aluminum and glass frame that is 1.5 m in height and 2.5 m wide on the south wall. The window is split into three sections: the center section is fixed, while the sections at each end are operable and regarded as movable elements of the office. The main door and a side door of the office space are both made of wood. The front door, which measures 1.2 m × 2.4 m and leads to the passageway, is located on the north wall, right in the front of the window. The coupled ventilation above the main entrance door measures 1.2 m × 0.5 m in size. The office room's east side wall contains the side door, which is 0.9 m × 2.0 m in size, and is regarded as a movable element of the office. A double door is used for the front door, while a single door is used for the side door. The room contains one ceiling fan. One window air conditioner is fitted under the fixed window pane at the height of 0.3 m from the ground floor. Table 3 outlines the additional room's components in particular that are not mentioned in [33]. Table 3. Additional components of the office room not listed elsewhere [33].

Component Area/Volume Fixed/Variable Material
Because the Hawthorne effect [57] would have an adverse influence on the participants' typical behavior (talking, working, activity level, free motion, etc.), the study's objective was kept confidential from both the dynamic and static occupants (subjects). If subjects knew they were being watched, their actions may have changed. If they were aware of the study objectives, they also may have altered their breathing patterns, movements, and other behaviors. However, only one person (the first author) was aware of the research goal, since they were collecting the information for this study. A previous [33] contained information on the subjects. As indicated in Figure 3, all of the static participants sat in the 'L' shape (longitudinal direction) at a space of around one meter.  Figure 4 depicts a two-dimensional representation of the monitored office room. The different operational scenarios were based on the status of the ceiling fan, the status of the window air conditioner, and the operability of two doors and two windows. From Monday through Friday, data were collected 44 times each day at ten-minute intervals. Figure 5 depicts the data gathering schedule.  Figure 4 depicts a two-dimensional representation of the monitored office room. The different operational scenarios were based on the status of the ceiling fan, the status of the window air conditioner, and the operability of two doors and two windows. From Monday through Friday, data were collected 44 times each day at ten-minute intervals. Figure 5 depicts the data gathering schedule.  The dataset plot matrix is presented in Figure 6. The two most crucial elements of a scatter plot diagram are the 'R' and 'P'. The correlation coefficient between two values shows the fitting relationship for the selected datasets. The p value is the probability, which shows the relationship between the two values, which is equal to zero. Strong   The dataset plot matrix is presented in Figure 6. The two most crucial elements of a scatter plot diagram are the 'R' and 'P'. The correlation coefficient between two values shows the fitting relationship for the selected datasets. The p value is the probability, which shows the relationship between the two values, which is equal to zero. Strong correlations have low p-values because the probability of the relationship parameters is very low. For 'P', the highest value is 0.82 and the lowest value is 0. 'R' is highest between the input features AP and VP, both of which have a value of one. However, the 'R' is minimal at points AP and O, as well as VP and O. At the aforementioned points, the minimal value of R is −1, as represented by the dark blue circles in the middle of Figure 6. The dataset plot matrix is presented in Figure 6. The two most crucial elements of a scatter plot diagram are the 'R' and 'P'. The correlation coefficient between two values shows the fitting relationship for the selected datasets. The p value is the probability, which shows the relationship between the two values, which is equal to zero. Strong correlations have low p-values because the probability of the relationship parameters is very low. For 'P', the highest value is 0.82 and the lowest value is 0. 'R' is highest between the input features A P and V P , both of which have a value of one. However, the 'R' is minimal at points A P and O, as well as V P and O. At the aforementioned points, the minimal value of R is −1, as represented by the dark blue circles in the middle of Figure 6. The indoor temperature (TIn), indoor humidity (RHIn), area of opening (AO), number of occupants (O), area per person (AP), volume per person (VP), CO2 concentration (CO2), air quality index (AQI), outer wind speed (WS), outdoor temperature (TOut), outdoor humidity (RHOut), fan air speed (FS), and air conditioning (AC) were the thirteen input parameters that were collected. According to the input data, the CO2 concentration levels within the office room varied from 354 to 1398 ppm. The R-Event valued had a range of 0.04-1.00. Table 4 presents the data for all of the input parameters and output parameters along with their minimum, maximum, and mean values. Additionally, the standard deviation, kurtosis, and skewness are also presented in the table. The mean indoor temperature was 29.84 °C, with a mean indoor relative humidity of 38.54 percent. The area of opening was 2.42 m 2 . The mean occupancy rate was 2.94. The mean AP and mean VP were 8.28 m 2 and 30.90 m 3 , respectively. The mean CO2 concentration inside the office room was 648.42 ppm during the measurement period. The mean AQI was 124.94 and the outer mean wind speed was 13.97 kmph. The outdoor mean temperature and mean relative humidity were 37.43 °C and 11.95 percent. The mean values for the fan speed and air conditioning were 1.79 m/s and 0.40, respectively. The mean R-Event value for the MM case was 0.128, as presented in Table 4.  The indoor temperature (T In ), indoor humidity (RH In ), area of opening (A O ), number of occupants (O), area per person (A P ), volume per person (V P ), CO 2 concentration (CO 2 ), air quality index (AQI), outer wind speed (W S ), outdoor temperature (T Out ), outdoor humidity (RH Out ), fan air speed (F S ), and air conditioning (AC) were the thirteen input parameters that were collected. According to the input data, the CO 2 concentration levels within the office room varied from 354 to 1398 ppm. The R-Event valued had a range of 0.04-1.00. Table 4 presents the data for all of the input parameters and output parameters along with their minimum, maximum, and mean values. Additionally, the standard deviation, kurtosis, and skewness are also presented in the table. The mean indoor temperature was 29.84 • C, with a mean indoor relative humidity of 38.54 percent. The area of opening was 2.42 m 2 . The mean occupancy rate was 2.94. The mean A P and mean V P were 8.28 m 2 and 30.90 m 3 , respectively. The mean CO 2 concentration inside the office room was 648.42 ppm during the measurement period. The mean AQI was 124.94 and the outer mean wind speed was 13.97 kmph. The outdoor mean temperature and mean relative humidity were 37.43 • C and 11.95 percent. The mean values for the fan speed and air conditioning were 1.79 m/s and 0.40, respectively. The mean R-Event value for the MM case was 0.128, as presented in Table 4.

Standardization of Selected Data
After the collection of the datasets, the selected database for this study was standardized. The most crucial stage in defining data into definite ranges, such as 0 to 1, −1 to +1, 0 to +0.9, or 0 to +0.8, is the standardization stage [33,34,58]. The chosen input parameters' values could change over a certain range. In order to make the computations simpler, data standardization was used. Equation (3) was used to standardize the data in this research study in the range of 0 to +0.8: Here, N is the random value from the database, N standardized is the value that has to be normalized, N min is the parameter's smallest value, and N max is its highest value.

Filtration of Data
In order to construct a multi-layered, feed-forward, back-propagation learning method, numerous researchers proposed using 70% of the data for training, 15% for validation, and 15% for testing. The data was divided into three portions with the 15:15:70 ratio of the entire dataset for testing, validation, and training purposes, respectively, following a short assessment of the ANN literature. After dividing the 2207 datasets, 331 datasets were utilized in this study for testing, 331 datasets for validation, and 1545 datasets for training.

R-Event Prognosis
In order to create the relationship and forecast the R-Event, two techniques were used in this study. The first technique was simply applying the first-degree order equation to the output and input parameters and was based on the CF method. The second approach was a type of artificial intelligence method, whereby supervised machine learning with multiple-variables ANN is applied to predict the R-Event for the mixed-mode ventilated office room.

Curve Fitting
The coefficient of correlation (Cc) may now be expressed computationally for all parameters as follows: Thus, the final equation can be expressed as follows:

Artificial Neural Networks
Frank Rosenblatt, a psychologist, developed the first ANN, also called a perceptron, in 1958, with the intention of simulating the human brain's interpretations of visual information and object recognition process. Artificial neural networks have significantly aided in the implementation of several intelligent information processing techniques and the study of the fundamental functions of real neurons and brain activity, as well as in numerous industrial applications during the past forty years [59]. The ANN modeling method uses computers to mimic certain important features of the human nervous system, such as the ability to address problems by utilizing information from prior experiences in new situations.
Learning models such as ANNs construct a network of key connections between the selected target and considered features, which are connected by connections recognized as neurons and concealed behind layers. Each neuron contains a weight parameter as well as hidden or input connections to neurons from the layer above, each of which is weighted differently. The performance of the learning model depends significantly on the number of neurons in the hidden layer of the ANN. If the hidden layer's population of neurons is too small, the error function will exhibit oscillating behavior and the network learning function will not be able to converge to an ideal value, making it difficult for the network to learn the relationships between the input-output configurations. When the number of neurons is too high, the input-output list is merely stored, resulting in poor generalization performance. The primary cause of overfitting is overtraining [60]. To achieve the best possible outcome, a trained ANN assembly can detect linkages between inputs and outputs by comparing measured and anticipated outputs [61].
Neural networks can be used to make models of difficult natural systems that have many inputs, making the models more accurate and simpler to use [62]. The most important feature in training is to minimize errors while increasing the value of R. This is achieved by modifying the weights during the course of the learning phase until the error function is achieved. In ANN, to evaluate the network performance, the 'R' and 'MSE' values are used [63]. Equations (7) and (8) express the 'R' and 'MSE' equations, respectively: R is the Pearson correlation coefficient, x i represents the measured values in the datasets, the mean value of the measured values is x, y i represents the predicted values in the datasets, and the mean value of the predicted values is y.
The above is an iterative approach to changing the value of w by approximating y i and calculating the associated MSE. The errors are too large at first since the weights are chosen at random. The aim of network learning is to find the weights with the lowest level of error across all datasets. Approximating the weights using "trial and error" approaches would require a significant amount of effort as well as time. The gradient descent method is an effective way to quickly track the minimum sets of errors in a network training process. The gradient descent slopes down the error using the error gradient. In this work, an ANN model was suggested to forecast the R-Event value for the MM ventilated office room based on the observed conditions of the thirteen input features.

ANN Modeling
Modeling is the procedure of describing a practical element or phenomenon as a series of computer assertions. It is critical to identify the network's best architecture, providing both high precision and a well-fitted dataset. Figure 8 depicts the ANN architecture for this scenario, which includes thirteen input features and the R-Event as a target parameter. There is no method for calculating the precise number of hidden layers and neurons on every layer; thus, the number of ideal neurons and hidden layers was established via "trial and error". After experimenting with several architectures and different counts of neurons in each layer, the ideal structure was discovered. The MSE-value-based network performance result is displayed in Figure 9. The model based on ANN started training from 3 neurons and trained until reaching 17 neurons. At the 3rd neuron the R of the training was 0.99916, while at the 17th neuron the R of training reached 0.99993. Each of the ANN models underwent at least 20 iterations. The suggested ANN model's optimum design is illustrated in Figure 10. Each neuron's MSE and R values were taken into consideration while determining its rank. In Table 5, the rankings of all neurons for their R and MSE values for training, testing, and validation are presented to show the overall best neurons. The 13th neuron trial had the lowest overall rating of all of the trials. The network that predicted the best performance among all neurons had a single hidden layer containing 13 neurons. The database features were linearly standardized in the range of 0 to 0.8 to speed up the learning process and promote quicker convergence. The ANN toolbox was used in the MATLAB (R2021a) environment to perform the simulations. The training techniques offered by MATLAB include Levenberg-Marquardt (LM), scaled conjugate gradient, and Bayesian regularization methods. The LM training method was chosen from among them due to its excellent convergence, high accuracy, and quick training of the network. This method often requires more memory but is quicker. The training ends when the generalization stops becoming better, as shown by an increase in the MSE of the validation samples. The same strategy was utilized in this study to randomly split the data into three portions: 15% for validation (331 datasets), 15% for testing (331 datasets), and 70% for training (1545 datasets). The activation functions for the hidden and output layers were selected to be TANSIG (Equation (9)) and PURELIN (Equation (10)), respectively: 0.00003469, respectively. Table 6 presents the neuron ranking based on their MSE and R values.   0.00003469, respectively. Table 6 presents the neuron ranking based on their MSE and R values.

Performance Indices
Basic performance indicators including the R, RMSE, MAPE, MAE, NS, and a20-index were taken into account while evaluating the effectiveness of the CF and ANN models in order to determine the models' reliability [64][65][66][67][68]. It is clear that COVID-19 has affected many areas of our life, including our environment, activities, and other factors, and may require intelligent solutions using AI techniques, medical images, and clinical data to control the pandemic [69][70][71][72][73][74][75][76][77]. Formulas (11) to (16) below give the equations for the performance indices listed above. The following acronyms stand for the following terms: R is the Pearson correlation coefficient, MAE stands for the mean absolute error, RMSE for the root mean square error, MAPE for the mean absolute percentage error, and NS for the Nash-Sutcliffe efficiency index. Higher values of R and NS approaching 1 are used to measure a model's accuracy. The most accurate model had values for the MAE, RMSE, and MAPE that were closest to zero. The most accurate model had the lowest values that were closest to zero for the RMSE, MAE, and MAPE. These performance indicators were used to assess how well the model could forecast R-Event values. In Section 3.2, Equations (7) and (8), corresponding to the R and MSE equations, are mentioned.   Figure 9 shows the MSE values for variable neurons for training, testing, and validation. The minimum MSE for the training is at the 14th neuron with a value of 0.00000070. The maximum MSE values for the training, testing, and validation are at the 3rd neuron, 8th neuron, and 9th neuron, with values of 0.00001223, 0.00303513, and 0.00003469, respectively. Table 6 presents the neuron ranking based on their MSE and R values.  8  10  8  1  1  8  3  1  22  3   9  11  13  10  10  13  11  10  67  12   10  12  10  5  2  6  4  2  29  5   11  13  2  2  6  2  2  6  20  1   12  14  1  3  7  1  1  9  22  2   13  15  5  6  5  4  6  5  31  7   14  16  3  4  3  5  5  3  23  4   15  17  6  13  12  7  13  12  63  9 For the MM ventilation datasets, Figure 10 shows the error and performance plots with the learning process. Figure 10b portrays the validation MSE as a green line, the training MSE as a blue line, and the testing MSE as a red line.

Performance Indices
Basic performance indicators including the R, RMSE, MAPE, MAE, NS, and a20-index were taken into account while evaluating the effectiveness of the CF and ANN models in order to determine the models' reliability [64][65][66][67][68]. It is clear that COVID-19 has affected many areas of our life, including our environment, activities, and other factors, and may require intelligent solutions using AI techniques, medical images, and clinical data to control the pandemic [69][70][71][72][73][74][75][76][77]. Formulas (11) to (16) below give the equations for the performance indices listed above. The following acronyms stand for the following terms: R is the Pearson correlation coefficient, MAE stands for the mean absolute error, RMSE for the root mean square error, MAPE for the mean absolute percentage error, and NS for the Nash-Sutcliffe efficiency index. Higher values of R and NS approaching 1 are used to measure a model's accuracy. The most accurate model had values for the MAE, RMSE, and MAPE that were closest to zero. The most accurate model had the lowest values that were closest to zero for the RMSE, MAE, and MAPE. These performance indicators were used to assess how well the model could forecast R-Event values. In Section 3.2, Equations (7) and (8), corresponding to the R and MSE equations, are mentioned.
The total number of values in the experimental dataset is denoted by N; x i and y i are the measured value and predicted value at ith level, respectively; y i denotes the mean value of the predicted results.

Results and Discussion
The R values for the CF and ANN models were 0.7439 and 0.9999, as shown in Figure 11a,b, respectively. The R value of the ANN model was 25.60% higher than the CF model. The other performance index values for the CF model, such as the MAE, MAPE, RMSE, NS, and a20-index values, were 0.0243, 23.2429, 0.0640, 0.5535, and 0.6611, respectively. Similarly, the performance index values for the ANN model, such as the MAE, MAPE, RMSE, NS, and a20-index values, were 0.0002, 0.1939, 0.0000, 0.9999, and 0.9991, respectively. The MAE, MAPE, and RMSE values for the CF model were 99.18%, 99.17%, and 100% higher than for the ANN model. The other performance factors such as the NS and a20-index values for the ANN model were 44.64% and 33.83% higher than for the CF model. As shown in Figure 11a, the plot of the CF model is more scattered as compared the ANN model. A correlation plot of the selected ANN model for training, validation, testing, and all datasets is presented in Figure 12. The results for the CF and ANN models are tabulated in Table 7.
The total number of values in the experimental dataset is denoted by N; and are the measured value and predicted value at ith level, respectively; denotes the mean value of the predicted results.

Results and Discussion
The R values for the CF and ANN models were 0.7439 and 0.9999, as shown in Figure 11a,b, respectively. The R value of the ANN model was 25 Figure 11a, the plot of the CF model is more scattered as compared the ANN model. A correlation plot of the selected ANN model for training, validation, testing, and all datasets is presented in Figure 12. The results for the CF and ANN models are tabulated in Table 7.   The variation in the predicted results, frequency distribution of the errors, and error plot with respect to the number of datasets for the CF model are shown Figure 13i-iii, respectively. Similarly, in the ANN model, the aforementioned results are shown in Figure 13iv-vi, respectively. It is also clearly visible from the Figure 13 the more scattered results and errors observed for the CF model as compared to the ANN model. The predicted datasets in Figure 13vi are directly aligned with the actual values of the R-Event, and the blue line that depicts the errors is also straight. However, the CF model's projected values and actual values are not quite in line, and the green line used to show this is similarly crooked. The histogram plot also confirms the reliability and accuracy of the ANN model. The constructed ANN model is more accurate and dependable than the CF model, which can be said after considering all of the performance parameters.  The variation in the predicted results, frequency distribution of the errors, and error plot with respect to the number of datasets for the CF model are shown Figure 13i-iii, respectively. Similarly, in the ANN model, the aforementioned results are shown in Figure 13iv-vi, respectively. It is also clearly visible from the Figure 13 the more scattered results and errors observed for the CF model as compared to the ANN model. The predicted datasets in Figure 13vi are directly aligned with the actual values of the R-Event, and the blue line that depicts the errors is also straight. However, the CF model's projected values and actual values are not quite in line, and the green line used to show this is similarly crooked. The histogram plot also confirms the reliability and accuracy of the ANN model. The constructed ANN model is more accurate and dependable than the CF model, which can be said after considering all of the performance parameters.

ANN Formulation
The suggested ANN model displayed great accuracy and interpretability, as was already mentioned. The ANN model may directly yield the explicit formula for the R-Event. Equation (17) is the final equation to forecast the R-Event: , Equation (18) illustrates the generalized formulation for the input to the hidden layer Yi.
where Ni,normalized represents the normalized inputs, BIH represents the biases between the input and hidden layers, WIH is the weight of the matrix in between the input and hidden layers, and BHO and WHO are the bias and weights of the hidden-to-output layer, respectively. The proposed model is capable of predicting the R-Event for inputs with standardized projected data within the specified ranges. To forecast the indoor R-Event values for a mixed-mode ventilated office room scenario, the suggested model has

ANN Formulation
The suggested ANN model displayed great accuracy and interpretability, as was already mentioned. The ANN model may directly yield the explicit formula for the R-Event. Equation (17) is the final equation to forecast the R-Event: Equation (18) illustrates the generalized formulation for the input to the hidden layer Yi.
where N i , normalized represents the normalized inputs, B IH represents the biases between the input and hidden layers, W IH is the weight of the matrix in between the input and hidden layers, and B HO and W HO are the bias and weights of the hidden-to-output layer, respectively. The proposed model is capable of predicting the R-Event for inputs with standardized projected data within the specified ranges. To forecast the indoor R-Event values for a mixed-mode ventilated office room scenario, the suggested model has thirteen parameters. The value of Y i can be obtained from Equation (20). The R-Event final equation is expressed in Equation (19): In addition to creating CF-and ANN-based models, some additional broad observations were made throughout the investigation. A higher degree of human activity increases the likelihood of illness transmission and directly affects the CO 2 content. For a safe working environment, the office size and volume are crucial factors. The occupancy rate, on the other hand, is the main variable that has a high correlation with the R-Event. Therefore, the occupancy level must be decreased to a safe level for a safe working environment. It is crucial to ventilate indoor settings to maintain low CO 2 levels in order to stop the spread of SC-2. The CO 2 level and indoor airborne transmission rate can be substantially correlated, demonstrating the need for ventilation. For improved operational control, maintaining all doors, windows, fans, and other ventilation systems is advised. Effective maintenance improves the effectiveness during full-scale operation. Nets are further advised for usage in order to stop insects from entering the office. Regular conversation is not encouraged in the workplace, and when it occurs without a mask, the risk of an infection spreading is increased. Installing CO 2 sensors in offices to keep track of the CO 2 levels is advised. The danger of viral transmission is considerably decreased via training and enforcing rules.

Limitations of the Study
This study has several restrictions. This study's focus is only on the interior spaces of the office buildings. Under an office setting working under a mixed-mode ventilation mode in a composite climatic environment, a singular source of infection is considered, with pathogen proliferation occurring exclusively through airborne transmission channels. A range of technological, financial, social, and chronological limitations affect this research. However, the procedure is consistent and may be used for a wide range of buildings in a wide range of climatic regions with a wide range of environmental conditions. During this investigation, the directions and patterns of air flow were not taken into account. However, they may significantly affect the transmission of airborne viruses, limiting the study's applicability. Since office hours prevent the monitoring of nocturnal oscillations, only diurnal environmental fluctuations were taken into consideration. The participants were healthy adults with a normal breathing rate and no respiratory illnesses. The information acquired was case-based and reliant on several other environmental factors, such as the wind direction and pressure, which were not included in this study. This study considered all occupants occupying the office space without face masks and who were not immune to SC-2, thereby remaining susceptible. This study considered the delta variant of SC-2 for prediction purposes, which is estimated to be twice as infectious as the original virus. The numerous new and ancient SC-2 versions exhibit extremely inconsistent behavior. The interactions between humans and viruses are, therefore, not covered by this research.
Human behaviors vary widely depending on a wide range of circumstances, which is a limitation in and of itself because each person has a certain behavioral pattern based on their experiences and brain prints. There were a few additional restrictions to this study, as the model could only predict results within the boundaries of the dataset. Furthermore, sneezing, coughing, and other respiratory activities were not considered in this study. People's outer mobility (near office doors and windows) is also a significant component; however, owing to the extremely dynamic nature of such situations, this aspect was not taken into account in this study. The interference of outdoor CO 2 concentrations was one more limitation of this study. The developed model predicts the average rate of event reproduction for an office room with diurnal variations, while other case-based alterations were limitations for the developed model. Close contact and fomite transmission were not considered. Full mixing with equal concentrations in the whole office room was considered, so the lack of non-uniform concentrations was one more limitation of the developed model. The models developed are single-zone models. Several parameters were uncertain, and were taken from the literature or estimated based on the current best available knowledge. The developed model was sensitive to the quanta emission rate values. The authors will address some of these limitations in future studies. More realistic and complex models can be built; however, the parametric uncertainty may still dominate the total uncertainty. Futuristic models can consider parameters based on the findings of new research to incorporate advanced knowledge.

Conclusions
AI-based modeling is usually a long and complex process. This study provides two models, one being an analytical model, i.e., the curve fitting (CF) model, and the second model being an AI-based model, i.e., the artificial neural network (ANN) model. Both the predictive models, namely the ANN and CF models, were tested and compared to forecast the R-Event values. The real-time data for the office environment were gathered in March, April, and May of 2022 in a mixed-mode (MM) ventilated office room situated in a composite climate. Thirteen features were used as inputs when developing the models. The R-Event was the target that was utilized as a proxy for the likelihood of SARS-CoV-2 infection transmission. The data were statistically examined first, then the CF model was used and the analyzed data were trained, tested, and validated with the ANN model. Only the ANN model did well in predicting the R-Event results when compared to the CF model. The ANN model showed a stronger correlation and lower error s(R = 0.9999, MSE = 0.0000, RMSE = 0.0000, MAE = 0.0002, MAPE = 0.1939, NS = 0.9999, and a20-index = 0.9991) with the measured results. When the data for the required input parameters are available, the ANN model may be utilized to reliably forecast the R-Event. The models created in this work may be used to forecast the R-Event, which serves as a proxy for estimating the likelihood that SARS-CoV-2 will be transmitted in MM office settings. As a result, time, effort, and human lives will be saved.