Sensor-Based Machine Learning Approach for Indoor Air Quality Monitoring in an Automobile Manufacturing

: The alternative control concept using emission from the machine has the potential to reduce energy consumption in HVAC systems. This paper reports on a study of alternative inputs for a control system of HVAC using machine learning algorithms, based on data that are gathered in a welding area of an automotive factory. A data set of CO 2 , ﬁne dust, temperatures and air velocity was logged using continuous and gravimetric measurements during two typical production weeks. The HVAC system was reduced gradually each day to trigger ﬂuctuations of emission. The data were used to train and test various machine learning models using different statistical indices, consequently to choose a best ﬁt model. Different models were tested and the Long Short-Term Memory model showed the best result, with 0.821 discrepancy on R 2 . The gravimetric samples proved that the reduction of air exchange rate does not correlate to escalation of ﬁne dust linearly, which means one cannot rely on just gravimetric samples for HVAC system optimization. Furthermore, by using machine learning algorithms, this study shows that by using commonly available low cost sensors in a production hall, it is possible to correlate ﬁne dust data cost effectively and reduce electricity consumption of the HVAC.


Introduction
The digitalization of manufacturing is being driven forward by the decreasing costs of information and communication technology. Cyber-physical systems (CPS), or in the manufacturing context Cyber-Physical Production Systems (CPPS), are an important technological element in the realization of the 4th industrial revolution [1]. CPPS and their applicability in industrial environments are of increasing interest in current research and industry. A CPPS is clustered into subsystems: the physical world and the cyber world. These subsystems interact with each other through data acquisition and decision-making support and control functionalities, respectively [2].
In 2018, the World's energy consumption has increased by almost 40% compared to the consumption in 1990 [3]. The industrial share of total final consumption (TFC) stayed at the~30% [4]. Although the absolute consumption increased to~57% compared to the consumption of industrial sector in 1990 [5]. In times of increasing energy demand and decreasing energy resources, a further optimization and revaluation in industrial sector is imperative.
The Heating Ventilating and Air-Conditioning (HVAC) system is responsible for a large part of the energy consumption in industry. According to internal Volkswagen data [6], it accounts on average for about 40% of the total energy consumption in industrial buildings. Automobile production can be classified into press shop, assembly, paint shop

Particulate Matter (PM)
Fine dust or Particulate matter (PM) is the term used to describe particles in the air that do not sink to the ground immediately, but remain in the atmosphere for a certain period of time. Particulate matter can come either from natural sources or from human activities. Natural contributions include emissions from volcanoes, oceans, soil erosion, forest and bush fires, as well as dead and eroded (abraded) organic remains, pollen, spores, bacteria and viruses. Sources from human activities include transport, power plants, district heating plants, waste incineration plants, private and commercial heating systems and certain industrial processes such as metal production, especially steel production [15,16].
Airborne particles range in size from a few nanometers (nm) to about 100 micrometers (µm). This depends on the process and material used in the working environment. Madanchi [17] presented an experimental approach to evaluate the influence of process parameters on the formation of emissions and particle size. The work showed that the process parameter (cutting speed and cooling lubricant volume flow) has linear impact not only towards the material removal process, but also the emissions and particle size.
The size of the dust particles, characterized by "aerodynamic diameter" as a model, is the decisive property for their classification [15,18]:

•
The term 'Total Suspended Particulate Matter' (TSP) covers all suspended particles with an aerodynamic diameter smaller than~60 micrometers (µm), • PM10 refers to all particles with aerodynamic diameter less than 10 micrometers. • PM2.5 refers to particles with an aerodynamic diameter of less than 2.5 micrometers. • The smallest categorized particles are called ultra fine particles. Their aerodynamic diameters are less than 0.1 micrometers.
The nose, mouth and throat can filter back particles larger than 10 micrometers. A small percentage of particles between 10 and 2.5 µm (PM10-2.5) are able to reach the bronchiole and alveoli. Particles that have a diameter smaller than 2.5 µm (PM2.5) have a higher probability to reach and stay in the alveoli. These particles can trigger inflammatory processes in the lung tissue [19]. In the alveoli, respiration and blood circulation are functionally and anatomically very closely connected [15]. This is why abnormalities in one system, such as inflammations in the respiratory tract, can also additionally affect other systems, for example, the heart or circulatory system [15].

National Regulations and Technical Rules concerning Particulate Matter (PM) and Industrial Welding
The main focus of welding processes lies on materials (filler and base material). However, note that neither base materials nor filler materials (for example, welding materials) are considered "hazardous substances". "Hazardous substances" are only released as a result of a welding process, which consist primarily of particulate and/or gaseous hazardous substances. These hazardous substances are classified within the meaning of the Regulation on Hazardous Substances (GefStoffV). An occupational exposure limit was included in TRGS 900 [9]. TRGS 900 shows a list of the quantities' threshold for substances with specific toxic properties, for example, for dusts containing heavy metals or for wood dusts and flour dusts. If the substance in use is not listed, then a general threshold of fine dust is applicable. These thresholds are compulsory in Germany from beginning of 2019, after a transitional period of five years (until 2018) was established. However, welding fumes are not dusts in the sense of the definition, as dusts are produced during mechanical processes, for example grinding. Respectively, welding fumes are produced by thermal processes at very high temperatures, for example, arc welding. Welding fumes can therefore not be directly equated with dust. It consists of mainly from particles that are alveolar. In comparison to fine dust, fumes are smaller in size, approximately between 0.2 to 1 µm [20]. This means that the welding fume particles and in unfavorable workplace situations to impair the ability of the person exposed to the respiratory tract or damage it. Furthermore, other negative health effects on the human organism are also to be considered [21]. A direct specific threshold for welding fumes has not yet been defined. The metal compound that can be found in welding fumes has a dust density higher than 0.0025 mg/m 3 , approximately between 0.005 mg/m 3 and 0.007 mg/m 3 [22]. In reality, this is the reason for the risk assessment: the dust limit values are measured and converted mathematically to get the density of the metals from the fumes [22].
Basically, the dust limit value for alveolar dust is (<2.5 µm) of 1.25 mg/m 3 [9]. DIN EN 689 [23] defined the limit further, including the area analysis, control measurements, protocol and processing of measurement data. In the automotive industry, companies such as Volkswagen AG have to establish their safety measures and define their time interval between control measures, based on their measurements data and national regulations. According to DIN EN 689 [23], the time interval between measurements should be determined taking into account the following factors: • process cycles taking into account normal working conditions; • effects of failure of control devices; • proximity to the limit value; • the effectiveness of process controls; • time span until the condition is under control again; • the variation over time of the measurement results.
Based on these factors and the upper limit defined by the national regulations, Volkswagen AG detailed their guidelines concerning the threshold and control measurements in welding area for alveolar dust. Using gravimetric analysis, the expositions in welding areas are measured and controlled.

Methodology and Approach
As methodological framework a Cyber-Physical Production Systems (CPPS) is used; Figure 1 shows an adapted CPPS framework for this application.  The approach to integrate a CPPS in the body shop is driven by the goals of improving energy efficiency while maintaining air quality in the production environment and being cost efficient. The CPPS system provides a decision support by calculating the fine dust concentration in the production environment. This enables the HVAC operator in the physical world to use conservative sensors, such as temperature, relative humidity and CO 2 to operate the HVAC system. Further, the CPPS enables a model-based control in the future.

Data
With this cyber-physical HVAC system, energy efficiency, air quality and thermal conditions can be improved in the production environment by providing decision support, greater transparency and enabling model-predictive control in real-time. With the proposed approach, one can measure fine dust in the air without using expensive fine dust sensor in each welding area in a body shop. This can lead to a high cost savings when monitoring a large body shop hall in a factory. The low-cost IoT sensor technology combined with machine learning enables large-scale deployment with moderate costs compared to the high-precision fine dust sensors on factory scale.

Setup for Collecting Data in the Factory Hall
In order to find the correlation between the HVAC-System, air quality and production activity, it is necessary to collect and analyze the variables on site. An experiment within the scope of a collaboration project between TU Braunschweig and Volkswagen AG is thus commenced. To investigate seasonal influences, experiments were conducted in winter and summer at five consecutively working days, to be exact from 6th to 8th of August 2018 and from 26th to 30th of November 2018. This allowed us to measure the experiment under 5 different air exchange rates, which were set 8 h before the start of each measurement. For the first three days, the reductions were concentrated on the main exhaust air vents, the openings of which can be found directly under the ceiling of the production hall, this is detailed Table 1. Moreover, for the last two days, we set to measure the effect on the reductions on the technical exhaust vents, which is directly responsible for exhausting the welding emission. Figure 2 shows the difference on the positions of the two exhaust vents and the HVAC system used in the factory hall.  The reduction limit was set to 50% because the technical exhaust vents are the main way to exhaust the byproducts of the welding. Table 1 lists the experimental parameters for the work days with the expected energy saving potential in compare to the reference day (monday), which is based on the assumption of constant fan efficiency. A change in volume flow rate affects the fan power requirement as follows [8]: for example, 10% less volume rate will lower power demand by 33%. The supply air fan has a nominal power of 40 kW, while the exhaust air fan and the technical exhaust fan operate at nominal power of 30 kW and 60 kW.
The samples were taken during welding corrections; welding of floor elements (galvanized steel) with the welding torch SKS Power Feeder PF5 (MAG method); coated with steel wire with copper with a diameter of 1.0 mm; Argon; and CO 2 inert gas welding, sanding and cleaning of welding seams. An oscillating time of at least 6 h was taken into account, so that the air conditions in the welding area settle before the measurements were commenced.

Data Acquisition and Analysis
In order to ensure that the regulatory thresholds are well maintained, certified gravimetric measurements from the work safety department accompanied the experiment. This gravimetric measurements were executed according to PN EN 689:2018, a Polish version of EN 689 [24]. Tables 2 and 3 show the result of the gravimetric measurements respectively in summer time between 6 and 8 August 2018 and winter time from 26 to 30 November 2018, measured over the time period of 8 h. The gravimetric measurement devices were hung directly on the workers shown in Figure 3. The results of the gravimetric measurements are shown in Tables 2 and 3. The exposition limits (PM2.5 = 1.25 mg/m 3 and PM10 = 10 mg/m 3 ) were held under the regulatory threshold during the experiments, except for the PM2.5 on Tuesday 7 August 2018. This outlier, however, was not backed by the PM10 result on the same day, which stayed far below the limit of 10 mg/m 3 . The measurements with the same air exchange rate settings were also repeated in winter and shown in the Table 3. The measurement results are more dependent on workers behavior, movements and routines, rather than the reduction in air exchange rate shown in Table 1, which might explain the spike and the nonlinear results.

Correlation Study between Fine Dust and Other Parameters
In order to be able to design a system which can predict the amount of values of PM10 and PM2.5 with the combination of other sensors, reference data sets are measured and collected.  Figure 3 shows a representative schematic of the measurements setup in the hand welding area on-site. The places chosen for the sensor installations represent the spots where workers operate manually using a hand welder during the productions, which means the resulting air velocity, Temperature, Relative Humidity and CO 2 depend greatly on workers actions and activities. The fine dust PM2.5 and PM 10 are measured using optical sensors as well providing a dataset for the purpose of validating the designed algorithm. The sensor used was fine dust FDS 15 from Dr. Födisch Umweltmesstechnik AG which has accuracy of ±5 µg/m 2 and a measuring range from 2 µg/m 2 to 3.000 µg/m 2 [10]. Table 4 shows the results of the measurements data in winter time on 8-h average, the data of PM2.5 and PM10 were measured using optical sensors FDS 15. With the proper use of the data from these sensors combined with an algorithm, the amount of PM10 and PM2.5 in the air can be predicted to a certain extent. Which variables are to be selected for input can be decided based on the correlation analysis. The best combination of the inputs, that gives the lowest error in the prediction, can be chosen as input variables for the model. This selection is known as feature selection [25].
From the correlation analysis, the dependency of PM10 and PM2.5 on other available variables is noticeable. Figures 4 and 5 show the calculated linear coefficients for each variable using Pearson and Spearman's Rho Correlation. The correlation between and input variables and target variable provides the basis for the feature extraction. Strong correlation means that the changes in chosen features (variables) causes higher changes in the target variable. Pearson coefficient suggests how two variables are correlated linearly. On the other hand, Spearman's Rho coefficient shows the nonlinear monotonic relation between two variables. The formula for the Spearman rank correlation coefficient is [26] is the difference between two ranks of each observation and n is the number of observations.    Table 5 shows exemplary d i as the difference between ranks of each observation of PM2.5 and relative humidity (RH), where n defined the total number of observations, which means total number of samples, in the case of this study approximately 350,000.

Data Model
Spearman's rho coefficient test is used to show the monotonic behavior of the variables to each other. PM10 and PM2.5 show a strong increasing monotonic relation with each other and also have sufficient monotonic relation with the current (CT), CO 2 and relative humidity (RH). This result also shows that PM10 and PM2.5 have decreasing monotonic behavior with Air Velocity (V). The combination of Pearson and Spearman's coefficients show that the Current (CT), CO 2 , Relative Humidity (RH) and Air Velocity (V) are more suitable to serve as input variables for the model. The aim is to exclude PM2.5 as an input and make a model which would give better prediction for PM10.
In order to select the best fit model, different machine learning models were designed, trained and tested using the available data. These includes linear regression and feedforward neural networks for regression and classification (Probabilistic Neural Network), which contain the whole interval of possible values separated in high number of intervals. This is a particular case of application of classification. Aside from that, the design of the models use supervised learning, which means the output target variable is already available for the training. Different features and respectively variables were considered for the models and the performances were compared. Current (CT), CO 2 , Relative Humidity (RH) and Air Velocity (V) Features are the variables chosen as inputs for the model. The splitting of the measured data is set at random, accordingly, the ratio of the training and test data sets was set manually. Figure 6 shows the schematic of the Feedforward Neural Network (FFNN) Model for regression using Current (CT), CO 2 , Relative Humidity (RH) and Air Velocity (V) as Inputs, feeding 2 Hidden Layers with each 10 Nodes and PM10 as output.

Input Layer ∈ ℝ⁴
Hidden Layer ∈ ℝ¹⁰ Hidden Layer ∈ ℝ¹⁰ Output Layer ∈ ℝ¹ The schematic of the Feedforward Neural Network Model for classification (Probabilistic Neural Network) using the same inputs, two pattern (hidden) layers and PM10 as output is shown in Figure 7. The number of nodes in the pattern layers are equal to the number of training points in the dataset. In the first pattern layer, the training data set uses 40% of the data, while the second layer utilizes 28% of the data set. Last, Long Short-Term Memory (LSTM), a deep learning algorithm, is used in this study, the data from the aforementioned parameters are used as input variables, 1 dense layer with 25 LSTM Blocks and the particulate matter PM10 as output variables. Figure 8 shows the schematic of the LSTM model for this study. The models were trained with different amount of data starting from 20% to 80%. The models did not overfit until 40%, as the results continue to get better with the increase in the amount of training data. Models showed little betterment with increase in training data above 40%. The computational time increases drastically with the increase in training data above 40% for LSTM and FFNN, so 40% training data was the ideal choice to compare the models.
After building and training the model with the defined amount of data, the rest of the measured data was used for validating (testing) the model. The model is tested with different kinds of statistical methods such as R 2 criteria for best fit, Minimum Absolute Error (MAE), Minimum Square Error (MSE) and Root Mean Square Error (RMSE). Visualization as well as statistical evaluation have been carried out to compare the actual data and the predicted data. By comparing different models with statistical as well as visual criteria, the best model for the application is chosen.

Model Result of Linear Regression, Feedforward Neural Network and Probabilistic Neural Network
From the correlation study, it was sufficient to say that a linear relationship exists up to some extent between the input variables and desired output. Building a linear regression model using the chosen variables from the correlation study seems to be a logical way to go. After training the parameters by using the training data, which consist of 40% of the data set, the model was tested using the rest of the data set. Figure 9 shows a direct comparison of the prediction from the linear regression model and the test data from the measured data set. An evident discrepancy exist can be seen from the comparison. Visually, Figure 10 shows a more satisfying result, which means the feedforward neural network model is a better fit of a model compared to linear regression. However, both results are not sufficient to use because of their inaccuracy. Both models use regression methods, in which the output layer consists of only one neuron and it gives a continuous output value.
Supervised learning classification methods also serve as alternative; they are used in building the model using probabilistic Neural Network. The result, thus the comparison to the test data is shown in the Figure 11, which displays a more satisfying result in compare to Figures 9 and 10. However, merely relying on visual analysis, a scientific analysis would not be valid. For example, the results shown in Figure 12, which show the visual representation of LSTM model output with the actual measured data, cannot show the difference or benefit of LSTM model compared to PNN model in Figure 11. Therefore, one needs to compare the results using statistical methods as well.

Statistical Analysis of the Models
In order to do statistical checks, the models are tested using different kinds of methods, such as R 2 -criteria for best fit, Minimum Absolute Error (MAE), Minimum Square Error (MSE) and Root Mean Square Error (RMSE) and using their results the best fit model can be chosen. Table 6 shows that the LSTM Model has the lowest error in all error checks and the best squared correlation. In contrast to the linear regression model that shows a poor fit on the R 2 -Corellation and the lowest performance on the error tests, the LSTM Model shows good results, which is well matched with the visual comparison of Figures 4 and 5, and thus supports the proof that LSTM is capable to be the best fit model for this study. In order to get a better result, one could use a cumulative training method. A cumulative learning method uses aggregation of data as it grows with time. Consequently, it uses knowledge acquired on prior training to improve learning performance on subsequent training. On the contrary, a static training method uses discrete data, which means for each new training time period the algorithms are reset and fed with new data. Fixed training data are applied to a machine learning algorithm, and it does not use any knowledge from prior training. Therefore, cumulative method reuses learned knowledge to constrain new training, whereas static method depends entirely upon new training data as external inputs [27]. Thor [28] gave further description and definition of cumulative learning in the context of machine learning in detail. Figure 13 shows the comparison of static and cumulative training methods and the statistic result from this study according to R 2 criteria. In the static training, the algorithm was trained for five days each month with new data. On the other hand, cumulative training uses aggregated data of the current and previous months. The cumulative training method showed a gradually increasing R 2 score reaching 0.81 R 2 , while the static method does not show a constant behavior and staying under 0.78 R 2 . This shows, that in training the LSTM Model, the cumulative training method is preferred.

Feb
March

Conclusions and Future Scope
This paper studied proof of concept for the Heating, Ventilation and Air-Conditioning (HVAC) system for a welding area. The proof of concept was modeled using four different machine learning algorithms and their performances were compared. The algorithms implemented were linear regression, feedforward neural network for regression, probabilistic neural network (Bayesian neural networks) for classification and Long Short-Term Memory (deep learning algorithm). Long Short-Term Memory showed the best result and potential for the control system.
A complete cyber-physical HVAC system in a body shop use case for the substitution of cost-intensive fine dust sensors was presented. The CPPS is then applied and validated in a real-world environment in one of the welding areas of Volkswagen's production hall in Wrzesnia, Poland. A setup was configured and established based on the regulatory threshold to collect experimental data during production hours, in order to have a solid validation ground. The data were collected in 2 phases (in summer and winter), each extended through 5 production days. The concerned data set consist of air temperature,; relative humidity; air velocity; electric current used for welding; carbon dioxide; inhalable coarse particles; PM10, which are dust particles with a diameter of 10 micrometers (10 µm) or less and fine particles; and PM2.5, which has a diameter of 2.5 µm or less.
These were measured in the welding area during manual welding corrections, welding of floor elements (galvanized steel) with the welding torch (MAG method), sanding and cleaning of welded seam. As a double-proofing measure, 8-hour gravimetric measurements were taken, commissioned by the Volkswagen work safety department, in order to ensure that the reductions of air exchange rate kept the exposition thresholds under the regulated limits. Gravimetric measurement (respectively analysis) describes a method to quantify a substance or chemical constituent in a mixture based on its mass. In our case, the relevant substance is fine dust and the mixture is air. The gravimetric measurements were executed according to PN EN 689:2018. It showed that the reduction of air exchange rate does not have a linear correlation with the change in the amount of fine dust in the working area. The correlation study according to Spearman's rho and Pearson showed that only four variables have direct correlation with the outputs (PM10 and PM2.5): relative humidity, CO 2 Concentration, electric current and air velocity in the area. Because PM10 and PM2.5 show a strong increasing correlation with each other, PM2.5 could be excluded as an output variable. This will save a generous amount of computing time and cost. As supervised machine learning and deep learning algorithms are used in this study, the data from the aforementioned parameters are used as input variables and the particulate matter as output variables. The data were then split for training and testing of the models. The models are then tested statistically using different methods, such as R 2 criteria for best fit, Minimum Absolute Error (MAE), Minimum Square Error (MSE) and Root Mean Square Error (RMSE). This shows that LSTM dominated the test with 0.821 discrepancy on R 2 , 0.01020 on MAE, 0.00122 on MSE and 0.0257 on RMSE. As a comparison, the Linear Regression Model showed a poor result with 0.008 discrepancy on R 2 , 0.03563 on MAE, 0.00377 on MSE and 0.05689 on RMSE. The results were also examined using graphical method. The LSTM or Long Short-Term Memory showed the best result and therefore is best suited for the HVAC control concept.
In the future it is possible to enhance the HVAC control using the LSTM model. It could be possible to install several micro controllers in the factory hall, that send the acquired data that is not expensive to measure data, such as relative humidity, CO 2 , electricity current and air velocity. Data can then be used to train the LSTM model continuously. The output of the models can send computed fine dust values in real time to the HVAC system, which would be used as control parameter, besides typical control parameters, such as desired temperature, relative humidity and CO 2 especially on factory scale, the cost savings would be significant and are very interesting for factory operators and planners.