Using the IBM SPSS SW Tool with Wavelet Transformation for CO2 Prediction within IoT in Smart Home Care

Standard solutions for handling a large amount of measured data obtained from intelligent buildings are currently available as software tools in IoT platforms. These solutions optimize the operational and technical functions managing the quality of the indoor environment and factor in the real needs of residents. The paper examines the possibilities of increasing the accuracy of CO2 predictions in Smart Home Care (SHC) using the IBM SPSS software tools in the IoT to determine the occupancy times of a monitored SHC room. The processed data were compared at daily, weekly and monthly intervals for the spring and autumn periods. The Radial Basis Function (RBF) method was applied to predict CO2 levels from the measured indoor and outdoor temperatures and relative humidity. The most accurately predicted results were obtained from data processed at a daily interval. To increase the accuracy of CO2 predictions, a wavelet transform was applied to remove additive noise from the predicted signal. The prediction accuracy achieved in the selected experiments was greater than 95%.


Introduction
An intelligent building is one that is responsive to the requirements of occupants, organizations, and society. An intelligent building requires real-time information about its occupants so that it can continually adapt and respond [1]. Intelligent buildings respond to the needs of occupants and society, promoting the well-being of those living and working in them [2]. The researchers point out that the Adaptive House (the concept of a home which programs itself), Learning Homes and Attentive Homes must be programmed for a particular family and home and updated in line with changes in their lifestyle. The system monitors actions taken by the residents and looks for patterns in the environment which reliably predict these actions, where a neural network learns these patterns and the system then performs the learned actions automatically for improving the Quality of Life (QoL) [3]. Privacy, reliability and false alarms are the main challenges to be considered for the development of efficient systems to detect and classify the Activities of Daily Living (ADL) and Falls [4]. In order to provide a user-friendly environment for the management of the operational and technical functions along with providing support for the independent housing of senior citizens and disabled persons in buildings indicated as Smart House Care (SHC), it is necessary to make appropriate visualization of the technological process as required by the users with the possibility of the indirect monitoring of The PI System includes SW tools such as PI ProcessBook for user-friendly data readout with the ability to create an application for the visualization and monitoring of SHC resident activities ( Figures  2 and 3) [51][52][53].  The PI System includes SW tools such as PI ProcessBook for user-friendly data readout with the ability to create an application for the visualization and monitoring of SHC resident activities (Figures 2 and 3) [51][52][53].
Sensors 2019, 19,1407 4 of 27 a software tool that allows you to consolidate several types of bus systems, standards and communication, and data protocols into o single monitoring application. In our case, we took advantage of the opportunities offered by the PI System software tool (hereinafter referred to as PI) produced by OSIsoft (Figure 1) [50]. The PI System includes SW tools such as PI ProcessBook for user-friendly data readout with the ability to create an application for the visualization and monitoring of SHC resident activities ( Figures  2 and 3) [51][52][53].

Visualization for the SHC Created in the PI ProcessBook Tool
Visualization of the wooden house in the PI Process Book tool is divided into several screens, which can be continuously accessed from the main screen ( Figure 2). The SHC control technology is integrated on each visualization screen in accordance with the Building Management System (BMS). These screens further comprise of buttons for entry into individual rooms. After clicking on the relevant room button, a new window, containing a detailed description of the technology used in the specific room shown in the individual charts, will appear. Each technological element is illustrated in the chart, wherein the individual charts are sorted in accordance with the groups of elements used ( Figure 3). The individual technology units are then displayed on separate screens and can also be viewed from the main screen. This solution was chosen because it is not possible to place all the information about the technologies implemented on one screen so that the screen remained wellarranged. The additional distribution of the technologies into individual screens will allow the user to get a better insight into what elements belong to the individual technologies and which do not anymore. Thanks to this solution selected, orientation in the enclosed charts are easier as well as the analysis of the individual quantities and actions in the building.
The visualization, monitoring, and processing of the measured values of non-electric variables, such as the measurement of temperature, humidity, and CO2 for monitoring the quality of the indoor environment of the selected room in the building described, are implemented using the PI System software application and SPSS IBM SW Tool (Figure 1).

Implementation of Predictive Analysis Using the IBM SPSS Modeler
The IBM SPSS Modeler allows users to build models using a simplified, easy-to-use, and objectoriented user interface. The user is provided with Modelling algorithms, such as prediction, classification, segmentation, and association detection. The model results can be easily deployed and read into databases, IBM SPSS Statistics and a wide variety of other applications. Working with IBM SPSS Modeler can be divided into three basic steps:

Visualization for the SHC Created in the PI ProcessBook Tool
Visualization of the wooden house in the PI Process Book tool is divided into several screens, which can be continuously accessed from the main screen ( Figure 2). The SHC control technology is integrated on each visualization screen in accordance with the Building Management System (BMS). These screens further comprise of buttons for entry into individual rooms. After clicking on the relevant room button, a new window, containing a detailed description of the technology used in the specific room shown in the individual charts, will appear. Each technological element is illustrated in the chart, wherein the individual charts are sorted in accordance with the groups of elements used ( Figure 3). The individual technology units are then displayed on separate screens and can also be viewed from the main screen. This solution was chosen because it is not possible to place all the information about the technologies implemented on one screen so that the screen remained well-arranged. The additional distribution of the technologies into individual screens will allow the user to get a better insight into what elements belong to the individual technologies and which do not anymore. Thanks to this solution selected, orientation in the enclosed charts are easier as well as the analysis of the individual quantities and actions in the building.
The visualization, monitoring, and processing of the measured values of non-electric variables, such as the measurement of temperature, humidity, and CO 2 for monitoring the quality of the indoor environment of the selected room in the building described, are implemented using the PI System software application and SPSS IBM SW Tool (Figure 1).

Implementation of Predictive Analysis Using the IBM SPSS Modeler
The IBM SPSS Modeler allows users to build models using a simplified, easy-to-use, and object-oriented user interface. The user is provided with Modelling algorithms, such as prediction, classification, segmentation, and association detection. The model results can be easily deployed and read into databases, IBM SPSS Statistics and a wide variety of other applications. Working with IBM SPSS Modeler can be divided into three basic steps:

1.
Importing the data into IBM SPSS Modeler 2.
Performing a series of analyses on the imported data 3.
Evaluation and exporting the data This sequence is also known as a Datastream because the data is flowing from the source to each analysis node and then to the output. The IBM SPSS Modeler allows the users to work with multiple data streams at once. These data streams can be build and modified using the stream canvas area of the application. These streams are created by drawing diagrams of relevant data operations. IBM SPSS Modeler's Node Palette area of the displays shows most of the available data and modeling tools. The user may perform a simple drag and drop on each item in the nodes palette to add them to the current stream. The node palette items are divided into a few main categories as follows [54] Neural networks are one of the many ways to achieve predictive analysis. IBM SPSS Modeler offers multiple types of neural networks for predictive analysis. The text further describes the procedure for determining the appropriate method of predicting the course of CO 2 concentration from the measured values taken by the indoor temperature sensor T i ( • C), (QPA 2062) in an SHC room (range 0 to 50 • C/−35 to 35 • C, accuracy ± 1K) and relative humidity rH (%), (QPA 2062) (range 0 and 100%, accuracy ±5%), outdoor temperature T o ( • C), (AP 257/22), (range: −30 . . . + 80 • C, resolution: 0.1 • C) using the RBF. The RBF was selected due to its higher speed of training [55]. The RBF network is a feed-forward network that requires supervised learning. Unlike multilayer perceptron's (MLP), this network consists of only one hidden layer. Overall, there are three layers in the RBF network: the input layer, RBF layer, and the output layer. The IBM SPSS algorithm guide describes mathematical models of these layers [56] (Figure 4) as following: Input layer: J 0 = P units, a 0:1 , . . . , a 0:J 0 with a 0:j = X j RBF layer: j 1 units, units, a 1:1 , . . . a 1:j 1 ; with a 1:j = ∅ j (X) Output layer: j 2 = R units, a I:1 , . . . a I:j 2 with a I:r = ω I:r + ∑   The training of RBF can be divided into two stages. The first stage determines the basis function by clustering methods and the second stage determines the weights given to the basis function. SPSS measures the accuracy of neural networks by calculating the percentage of the records for which the predicted value matches the observed value. For the continues values, the accuracy is calculated by 1 minus the average of the absolute values of the predicted values minus the observed values over the maximum predicted value minus the minimum predicted value (the following formula) [56].

Materials and Methods
In this section, we introduce a method for the CO2 concentration prediction optimization based on the Wavelet transformation additive noise canceling. Based on the experimental results, the predicted CO2 trend contains glitches representing the fast change part of the signal. Such signal segments may significantly deteriorate the quality of the prediction. We propose an optimized scheme of the neural network prediction based on the Wavelet filtration appearing as a robust method due to a wide variability of the filtration settings. Such a system significantly improves the prediction system based on the neural network.
In the signal processing, we assume that each signal y(t) is composed of two essential parts, namely, they are the signal trend T(t) and a component having a stochastic character X(t) which is perceived as the signal noise and details. Based on this definition, we can use the following signal formulation (3): The major problem when the signal trend is being extracted is noise detection. There are many applications of the trend detection including the CO2 measurement. Such a signal may be influenced by the glitches which should be removed to obtain a smooth signal for further processing. The The training of RBF can be divided into two stages. The first stage determines the basis function by clustering methods and the second stage determines the weights given to the basis function. SPSS measures the accuracy of neural networks by calculating the percentage of the records for which the predicted value matches the observed value. For the continues values, the accuracy is calculated by 1 minus the average of the absolute values of the predicted values minus the observed values over the maximum predicted value minus the minimum predicted value (the following formula) [56].

Materials and Methods
In this section, we introduce a method for the CO 2 concentration prediction optimization based on the Wavelet transformation additive noise canceling. Based on the experimental results, the predicted CO 2 trend contains glitches representing the fast change part of the signal. Such signal segments may significantly deteriorate the quality of the prediction. We propose an optimized scheme of the neural network prediction based on the Wavelet filtration appearing as a robust method due to a wide variability of the filtration settings. Such a system significantly improves the prediction system based on the neural network.
In the signal processing, we assume that each signal y(t) is composed of two essential parts, namely, they are the signal trend T(t) and a component having a stochastic character X(t) which is perceived as the signal noise and details. Based on this definition, we can use the following signal formulation (3): The major problem when the signal trend is being extracted is noise detection. There are many applications of the trend detection including the CO 2 measurement. Such a signal may be influenced by the glitches which should be removed to obtain a smooth signal for further processing. The wavelet analysis represents a transformation of the signal y(t) to obtain two types of coefficients, particularly they are the wavelet and scaling coefficients. These coefficients are completely equivalent with the original CO 2 signal. It is supposed that wavelet coefficients are related to changes along a specifically defined scale. The main idea of the signal trend detection is to perform an association of the scaling coefficients with the signal trend T(x). On the other hand, the wavelet coefficients are supposed to be associated with the signal noise, which is mainly represented by the glitches when processing the CO 2 signal. In our analysis, we considered an uncorrelated noise, adapting the wavelet estimator to work as a kernel estimator. The advantage of such an approach is formulating an estimator based on the sampled data irregularity. In this method, we used the scaling coefficients as estimators of the signal trend. We supposed that the sampled CO 2 observations are represented by Y(t n ), thus, the CO 2 estimator is given by Equation (4): Integration is done over a set of the intervals (A n (s)), their union forms perform partitioning interval covering all the observations t n , where t n ∈ A n . Consequently, E J is defined as Equation (5): In this expression, θ(t) represents the scaling function. This function is defined as follows (Equation (6)): The wavelet function is defined by Equation (7): The first crucial task is an appropriate selection of the mother's wavelet for the predicted CO 2 signal filtration. Supposing the Daubechies wavelets can well reflect morphological structure therefore, this family was used for our model. Particularly, in our approach, we used the Daubechies wavelet (Db6), with the D6 scaling function utilizing the orthogonal Daubechies coefficients.

Validation Ratings Used
In order to carry out the objective comparison, the following parameters were considered: Mean Absolute Error (MAE) represents the estimator measuring of the difference between two continuous variables. The MAE is given by the following expression: Mean squared error (MSE) represents the estimator measuring the average of the error squares between two signals. The MSE represents a risk function which corresponds with the expected value of the squared or quadratic error loss. The MSE is given by the following expression: Euclidean distance (ED) represents an ordinary straight-line distance between two points lying in the Euclidean space. Based on this distance, the Euclidean space becomes a metric space. The lower the Euclidean distance we achieve, the more similar are two signal samples. In our analysis, we considered a mean of the ED. The Euclidean distance is given by the following expression: City Block distance (CB) represents a distance between two signals x 1 ,x 2 in the space with the Cartesian coordinate system. This parameter can be interpreted as a sum of the lengths of the projections of the line segments between the points onto the coordinate axes. CB distance is defined as follows: The Correlation coefficient (R) measures a level of the linear dependency between two signals. The more the signals are considered linearly dependable, the higher the correlation coefficient is. In comparison with the previous parameters, the correlation coefficient represents a normative parameter. Zero correlation stands for the total dissimilarity between two signals, measured in a sense of their linear dependency. Contrarily, 1 and −1 stand for full positive and full negative correlation.
As we have already stated above, in our work, we analyze two-month CO 2 predictions. In each measurement, we have a prediction from the neural network with 10, 50, 100, 150, 200, 250, 300, 350 and 400 neurons. Thus, we completely analyzed 9 predicted signals for each measurement. These signals are compared against the reference based on the evaluation parameters stated above. In terms of the Euclidean distance and MSE, lower values indicate a higher agreement between the signal and reference and thus, a better result. Contrarily, a higher correlation coefficient shows better results. In the following part of the analysis, we report the results of the quantification comparison. All the testing is done for the Wavelet Db6, with 6-level decomposition and the Wavelet settings as follows: threshold selection rule-Stein's Unbiased Risk and soft thresholding for selection of the detailed coefficients.

First Part of the ADL Information in SHC from the CO 2 Concentration Course, Blinds, Slats and On/Off Control of Lights
The first experimental part in the study addressed the real needs of seniors who live in their own flats despite advanced age and mental and physical disabilities. These people strive to maintain maximum self-sufficiency and, thus, remove as much burden from their relatives, neighbors, friends or surroundings as possible. An example is a married couple, one of whom is mentally impaired, the other being the caregiver. They stay in touch with their family (their children) by SMS to keep them informed about how they are. In situations of acute need, the children are ready to come and help. In this example, the indirect ADL (Activities of Daily Living) in Room R203 ( Figure 2) in SHC can be detected by monitoring operational and technical functions, such as

Discussion of the First Experimental Part
In the aforementioned results of Experimental Part 1 ( Figures 5-8), the presence of persons in the SHC monitored area is clearly time-localized according to the ADL. ADL information in SHC can be thus reliably forwarded to the close relatives. The information obtained at daily, weekly and monthly intervals can also be used alongside SHC automation technologies in a so-called "smart building". This type of building records house activities and uses the accumulated data to automatically control technologies according to the predictable needs of users, such as controlling lights, blinds, heating, forced ventilation and cooling based on the usual patterns of use and facilitates cost savings for programming and configuring the intelligent house control system [2].

Discussion of the First Experimental Part
In the aforementioned results of Experimental Part 1 ( Figures 5-8), the presence of persons in the SHC monitored area is clearly time-localized according to the ADL. ADL information in SHC can be thus reliably forwarded to the close relatives. The information obtained at daily, weekly and monthly intervals can also be used alongside SHC automation technologies in a so-called "smart building". This type of building records house activities and uses the accumulated data to automatically control technologies according to the predictable needs of users, such as controlling lights, blinds, heating, forced ventilation and cooling based on the usual patterns of use and facilitates cost savings for programming and configuring the intelligent house control system [2].

Second Experimental Part: ADL Monitoring Information from Prediction CO 2 Concentration Course Within IoT SPSS SW Tool
As it was described earlier, the main goal is predicting CO2 concentration based on data collected by indoor humidity, indoor temperature, and outdoor temperature. The sample data were collected from the SHC. The procedure of this implementation can be divided into a few steps as following: 1.
Pre-processing the data 2.
Developing a data stream using IBM SPSS Modeler 3.
Testing various training data from different times of the year 4.
Analyzing the results and selecting the best model 5.
Uploading the selected model method to Watson studio for IoT implementation

Pre-Processing
Data Normalization using the min-max method (often known as feature scaling) was used as a pre-processing method. This method scales the parameters in the range between 0 and 1. Since the experimental data were stored in data files, the pre-processing stage was performed. The implementation was performed by calculating the minimum and maximum values of each parameter. Then the normalized values were calculated using Equation (12).
4.2.2. Developing a Data Stream Using IBM SPSS Modeler Figure 9 shows the data stream developed in IBM SPSS Modeler. In the first stage, the data were fed to IBM SPSS using Excel files by adding Excel node from the source category of the node palette. In the next stage, a data type selection operator was added from the field operator category. This node carried the task of setting the default target to CO 2 concentration and default inputs (in this case inputs were used as predictors) to humidity, indoor temperature, and outdoor temperature values. Data Normalization using the min-max method (often known as feature scaling) was used as a pre-processing method. This method scales the parameters in the range between 0 and 1. Since the experimental data were stored in data files, the pre-processing stage was performed. The implementation was performed by calculating the minimum and maximum values of each parameter. Then the normalized values were calculated using Equation (12). Figure 9 shows the data stream developed in IBM SPSS Modeler. In the first stage, the data were fed to IBM SPSS using Excel files by adding Excel node from the source category of the node palette. In the next stage, a data type selection operator was added from the field operator category. This node carried the task of setting the default target to CO2 concentration and default inputs (in this case inputs were used as predictors) to humidity, indoor temperature, and outdoor temperature values.

Developing a Data Stream Using IBM SPSS Modeler
There are a few common validation methods used in IBM SPSS Modeler such as K-fold, V-fold, N-fold, and Partitioning. The IBM SPSS Modeler User manual recommends using the partitioning method for large datasets due to the faster processing time. The partitioning method randomly divides the data sets into three parts of training, testing, and validation. The ratio of this division can be selected in the software using percentages of the data set. The partition node form field operator category was added to the stream in order to divide it into three parts. The first part (40% of the data) for the training of the neural network, using target values (CO2) and predictors (humidity, indoor temperature, and output temperature). The second part for testing (30% of the data) the neural network by using target and predictor values. The testing partition is used for selecting the most suitable model and prevention of overfitting. The last partition was dedicated for validation (30% of the data) of the developed model using only predictors and comparing the prediction with the reference signal. In other words, the validation partition is used to determine how well the model truly performs [19]. In the next step, an automatic data preparation field operator was used in order to transform the data for better predictive accuracy. The transformed data were fed to a neural network for training a model. The neural network is using the RBF model with various numbers of neurons for multiple implementations (Figure 10). The resulting model from the trained neural network is represented by a nugget gem. Varies nodes were connected to this nugget gem for additional analysis of the model such as an Excel node for exporting reference data and predicted values to an Excel file (a filter node was used to select which values to store in the output Excel file), a Time plot node to display the time plot of reference versus predicted values, a Multiplot node for displaying the plots from portioned There are a few common validation methods used in IBM SPSS Modeler such as K-fold, V-fold, N-fold, and Partitioning. The IBM SPSS Modeler User manual recommends using the partitioning method for large datasets due to the faster processing time. The partitioning method randomly divides the data sets into three parts of training, testing, and validation. The ratio of this division can be selected in the software using percentages of the data set. The partition node form field operator category was added to the stream in order to divide it into three parts. The first part (40% of the data) for the training of the neural network, using target values (CO 2 ) and predictors (humidity, indoor temperature, and output temperature). The second part for testing (30% of the data) the neural network by using target and predictor values. The testing partition is used for selecting the most suitable model and prevention of overfitting. The last partition was dedicated for validation (30% of the data) of the developed model using only predictors and comparing the prediction with the reference signal. In other words, the validation partition is used to determine how well the model truly performs [19].
In the next step, an automatic data preparation field operator was used in order to transform the data for better predictive accuracy. The transformed data were fed to a neural network for training a model. The neural network is using the RBF model with various numbers of neurons for multiple implementations ( Figure 10). The resulting model from the trained neural network is represented by a nugget gem. Varies nodes were connected to this nugget gem for additional analysis of the model such as an Excel node for exporting reference data and predicted values to an Excel file (a filter node was used to select which values to store in the output Excel file), a Time plot node to display the time plot of reference versus predicted values, a Multiplot node for displaying the plots from portioned data and an Analysis node for displaying the details such as linear correlation, mean absolute error, etc.

Testing Various Training Data from Different Times of the Year
For this stage of the experiment, seven different data sets from the spring and fall of 2018 were selected. The data collection was performed at the rate of one sample per minute. The first selected data interval was the whole month of May (Table 1). The validation results indicate that model number 7 with 76.6% accuracy and relevantly small error (MAE = 0.006, MSE = 9.75 × 10 −5 ) represents the best prediction. By repeating the experiment with data from November 2018 (Table 2), a slight improvement in the accuracy of all models can be observed. Additionally, by observing Table 3, it is apparent that model number 9 holds the best results in terms of accuracy (80.6%), linear correlation (0.898), MAE (0.011) and MSE (1.553 × 10 −3 ).

Testing Various Training Data from Different Times of the Year
For this stage of the experiment, seven different data sets from the spring and fall of 2018 were selected. The data collection was performed at the rate of one sample per minute. The first selected data interval was the whole month of May (Table 1). The validation results indicate that model number 7 with 76.6% accuracy and relevantly small error (MAE = 0.006, MSE = 9.75 × 10 −5 ) represents the best prediction. By repeating the experiment with data from November 2018 (Table 2), a slight improvement in the accuracy of all models can be observed. Additionally, by observing Table 3, it is apparent that model number 9 holds the best results in terms of accuracy (80.6%), linear correlation (0.898), MAE (0.011) and MSE (1.553 × 10 −3 ).
The training process was repeated by replacing the intervals with a week in May 2018 (Table 3) and a week in November 2018 (Table 4). With a reduction in the data size, the chances of overfitting were reduced, resulting in better generalization and higher accuracy ( Figure 11). These improvements can be observed in Tables 3 and 4     For the last few experiments, the size of the data sets was reduced to one day. The 15 th and 12 th of May (Tables 5 and 6) and the 15 th of November (Table 7), Figure 12 were selected for this experiment.
Once again, a significant improvement of the accuracies due to the reduction in the interval lengths can be observed. The result from the 12 th of May (Table 5) implies that model number 5 shows the maximum accuracy (98.1%). By considering the linear correlation and error values, it can be concluded that model number 3 shows better overall characteristics ( Figure 13). In the case of 15 th of May, model number 3 provides the most accurate result with an impressive 99.9% of accuracy, closely followed by model number 9 with 99.8% of accuracy. In the next step, the experiment was repeated with data from November 15. Similar to the few of the previous cases, model number 9 shows the highest accuracy (99.7%), relevantly high linear correlation value (0.895) and the lowest errors (MAE: 0.003, MSE: 6.45 × 10 −5 ).   correlation (0.996) and relevantly small error values. Model number 9 shows the best overall average accuracy (Table 9) and, in case of the experiments with the interval of 12 th (see Figure 12) and 15 th May, the accuracy difference between this model and the most accurate model (model number 3) is negligible. Therefore, model number 9 trained with the May 15 th interval was selected as the most suitable model for the next stages ( Figure 12).   accuracy (Table 9) and, in case of the experiments with the interval of 12 th (see Figure 12) and 15 th May, the accuracy difference between this model and the most accurate model (model number 3) is negligible. Therefore, model number 9 trained with the May 15 th interval was selected as the most suitable model for the next stages ( Figure 12).     Table 8 shows the average of the accuracy, linear correlation MAE and MSE in each experiment. By observing this table, it is apparent that the experiments with an interval length of one month, hold the lowest average accuracies (63.1% and 72%). It can also be observed that the experiment with the 15 th of May as period holds the highest average accuracy (99.5%), highest linear correlation (0.995) and relevantly low error values (MAE: 1.78 × 10 −3 , MSE: 2.44 × 10 −5 ). Additionally, the experiment with a interval length of a week in May shows a slightly smaller average accuracy (94.7%) and linear correlation (0.967) but it has overall lower error values (MAE: 1.78 × 10 −3 , MSE: 2.44 × 10 −5 ) ( Figure 14).

IoT Implementation with Watson Studio SW Tool
The data streams created in the IBM SPSS Modelers can be stored as a file (".str" format). The IBM Watson Studio allows the user to import the developed data stream simply by uploading the stored files. This allows the data streams that were originally developed in the IBM SPSS Modeler to take advantage of cloud computing, cloud storage and the possibility of near-real-time streaming. As it was explained earlier, model number 9 (with 400 neurons) trained with the data from 15th of May was selected as the best overall result of this experiment. Specifically, this model showed high accuracy, high linear correlation, low MAE and MSE errors. Therefore, it was uploaded to Watson studio. Figure 15 shows the streamed developed in SPSS in Watson studio for near-real-time training (Excel files were replaced with assets on the cloud). Figure 16 shows a data flow stream in Watson that includes model 9 trained with data from the 15 th of May for near real-time prediction.    Figure 11 clearly demonstrates the relationship between accuracy, the experiments interval length and the number of neurons. By summing up all of the obtained results, it is clear that model number 3 (Table 6) trained with the data from 15 th of May holds the highest accuracy (99.8%), linear correlation (0.996) and relevantly small error values. Model number 9 shows the best overall average accuracy (Table 9) and, in case of the experiments with the interval of 12 th (see Figure 12) and 15 th May, the accuracy difference between this model and the most accurate model (model number 3) is negligible. Therefore, model number 9 trained with the May 15 th interval was selected as the most suitable model for the next stages ( Figure 12).

IoT Implementation with Watson Studio SW Tool
The data streams created in the IBM SPSS Modelers can be stored as a file (".str" format). The IBM Watson Studio allows the user to import the developed data stream simply by uploading the stored files. This allows the data streams that were originally developed in the IBM SPSS Modeler to take advantage of cloud computing, cloud storage and the possibility of near-real-time streaming. As it was explained earlier, model number 9 (with 400 neurons) trained with the data from 15th of May was selected as the best overall result of this experiment. Specifically, this model showed high accuracy, high linear correlation, low MAE and MSE errors. Therefore, it was uploaded to Watson studio. Figure 15 shows the streamed developed in SPSS in Watson studio for near-real-time training (Excel files were replaced with assets on the cloud). Figure 16 shows a data flow stream in Watson that includes model 9 trained with data from the 15 th of May for near real-time prediction.

IoT Implementation with Watson Studio SW Tool
The data streams created in the IBM SPSS Modelers can be stored as a file (".str" format). The IBM Watson Studio allows the user to import the developed data stream simply by uploading the stored files. This allows the data streams that were originally developed in the IBM SPSS Modeler to take advantage of cloud computing, cloud storage and the possibility of near-real-time streaming. As it was explained earlier, model number 9 (with 400 neurons) trained with the data from 15th of May was selected as the best overall result of this experiment. Specifically, this model showed high accuracy, high linear correlation, low MAE and MSE errors. Therefore, it was uploaded to Watson studio. Figure 15 shows the streamed developed in SPSS in Watson studio for near-real-time training (Excel files were replaced with assets on the cloud). Figure 16 shows a data flow stream in Watson that includes model 9 trained with data from the 15 th of May for near real-time prediction.

Discussion of the Second Experimental Part
By evaluating the obtained results from the implementation with IBM SPSS Modeler, it is apparent that as it was expected that the experiments with interval length of one day showed better overall accuracy (average value up to 99.5%) and the experiments with sample periods of one month showed the least overall accuracy (average value up to 72.2%). Additionally, in four out of six experiments, model number 9 held the highest accuracy, and in the other two cases, it had the only insignificant difference with the most accurate models. Therefore, it was selected as the overall most accurate model. As it was mentioned earlier, in terms of the training interval, 15 th of May showed the most accurate results. Therefore, model number 9 with the 15 th of May training interval (Table 9) was selected and exported to an IBM cloud data stream. Furthermore, the results demanded additional filtering in order to reduce the noise and provide smoother results.

Testing and Quantitative Comparison
In our research, we analyzed signals representing the CO2 signals. We had a set of the estimated (predicted) signals being compared against the real measured signal CO2, which is perceived as a reference. In our analysis, we are comparing two-month CO2 prediction. We compared one-day, one-

Discussion of the Second Experimental Part
By evaluating the obtained results from the implementation with IBM SPSS Modeler, it is apparent that as it was expected that the experiments with interval length of one day showed better overall accuracy (average value up to 99.5%) and the experiments with sample periods of one month showed the least overall accuracy (average value up to 72.2%). Additionally, in four out of six experiments, model number 9 held the highest accuracy, and in the other two cases, it had the only insignificant difference with the most accurate models. Therefore, it was selected as the overall most accurate model. As it was mentioned earlier, in terms of the training interval, 15 th of May showed the most accurate results. Therefore, model number 9 with the 15 th of May training interval (Table 9) was selected and exported to an IBM cloud data stream. Furthermore, the results demanded additional filtering in order to reduce the noise and provide smoother results.

Testing and Quantitative Comparison
In our research, we analyzed signals representing the CO 2 signals. We had a set of the estimated (predicted) signals being compared against the real measured signal CO 2 , which is perceived as a reference. In our analysis, we are comparing two-month CO 2 prediction. We compared one-day, one-week and one-month predictions for May and November 2018.
Based on the observations, it is apparent that the predicted CO 2 signals do not have a smooth process. They are frequently influenced by rapid oscillations, so-called glitches and signal fluctuations (Figures 17 and 18). Such signal variations represent the signal noise, impairing the real trend of the CO 2 prediction, which should be reduced. In our analysis, we used wavelet filtration to eliminate such signals to obtain the signal trend for further processing.
As we have already stated above, we used the mother's wavelet Db6 for the CO 2 signal trend detection. Firstly, we take advantage of the fact that different level of the decomposition allows perceiving more or less signal details represented by the detailed coefficients. Since we need to perceive the signal trend by eliminating the steep fluctuations, we need to consider an appropriate level of the decomposition. An experimental comparison of individual wavelet settings is reported in Figures 17 and 18. Based on the experimental results, we used the 6-level decomposition for the CO 2 signal trend detection. The filtration procedure further utilizes the following settings: threshold selection rule-Stein's Unbiased Risk and soft thresholding for selection of the detailed coefficients.

Testing and Quantitative Comparison
In our research, we analyzed signals representing the CO2 signals. We had a set of the estimated (predicted) signals being compared against the real measured signal CO2, which is perceived as a reference. In our analysis, we are comparing two-month CO2 prediction. We compared one-day, oneweek and one-month predictions for May and November 2018.
Based on the observations, it is apparent that the predicted CO2 signals do not have a smooth process. They are frequently influenced by rapid oscillations, so-called glitches and signal fluctuations (Figures 17 and 18). Such signal variations represent the signal noise, impairing the real trend of the CO2 prediction, which should be reduced. In our analysis, we used wavelet filtration to eliminate such signals to obtain the signal trend for further processing.
As we have already stated above, we used the mother's wavelet Db6 for the CO2 signal trend detection. Firstly, we take advantage of the fact that different level of the decomposition allows perceiving more or less signal details represented by the detailed coefficients. Since we need to perceive the signal trend by eliminating the steep fluctuations, we need to consider an appropriate level of the decomposition. An experimental comparison of individual wavelet settings is reported in Figures 17 and 18. Based on the experimental results, we used the 6-level decomposition for the CO2 signal trend detection. The filtration procedure further utilizes the following settings: threshold selection rule-Stein's Unbiased Risk and soft thresholding for selection of the detailed coefficients.  Wavelet filtration was used for the extraction of the CO2 signal trend, simultaneously rapid changes of the signal were removed. On Figures 19 and 20, there is a comparison among the reference signal and predicted signals by wavelet transformation for day and month predictions from May and November 2018. Wavelet filtration was used for the extraction of the CO 2 signal trend, simultaneously rapid changes of the signal were removed. On Figures 19 and 20, there is a comparison among the reference signal and predicted signals by wavelet transformation for day and month predictions from May and November 2018. Wavelet filtration was used for the extraction of the CO2 signal trend, simultaneously rapid changes of the signal were removed. On Figures 19 and 20, there is a comparison among the reference signal and predicted signals by wavelet transformation for day and month predictions from May and November 2018.       Based on the results, wavelet filtration is capable of filtering rapid signal changes whilst preserving the signal trend. To justify this situation, we report the selected situations showing the glitches deteriorating a smooth signal trend, and a respective wavelet approximation largely reducing such signal parts (Figures 21-23). Among these cases, we mark the most significant glitches as green in the originally predicted signals to highlight the Wavelet smoothing effectivity.
As it is obvious, the CO 2 prediction contains lots of significant occurrences represented by the glitches and spikes, significantly deteriorating the smoothness of the analyzed signal. Wavelet appears to be a reliable alternative for reduction of those parts of the signal. On the other hand, we are aware that trend detection, in some cases, reduces the peaks and thus, the original signal's amplitude is reduced. Such situations are reported in  Based on the results, wavelet filtration is capable of filtering rapid signal changes whilst preserving the signal trend. To justify this situation, we report the selected situations showing the glitches deteriorating a smooth signal trend, and a respective wavelet approximation largely reducing such signal parts (Figures 21, 22 and 23). Among these cases, we mark the most significant glitches as green in the originally predicted signals to highlight the Wavelet smoothing effectivity.   preserving the signal trend. To justify this situation, we report the selected situations showing the glitches deteriorating a smooth signal trend, and a respective wavelet approximation largely reducing such signal parts (Figures 21, 22 and 23). Among these cases, we mark the most significant glitches as green in the originally predicted signals to highlight the Wavelet smoothing effectivity.   Based on the results, wavelet filtration is capable of filtering rapid signal changes whilst preserving the signal trend. To justify this situation, we report the selected situations showing the glitches deteriorating a smooth signal trend, and a respective wavelet approximation largely reducing such signal parts (Figures 21, 22 and 23). Among these cases, we mark the most significant glitches as green in the originally predicted signals to highlight the Wavelet smoothing effectivity.   In the last part of our analysis, the objective comparison is carried out. As we have already stated, we are comparing predicted CO 2 signals with signals being filtered out by the wavelet transformation. All the signals are compared against the reference CO 2 signals for day and week predictions.
As we have already stated above, in our work we analyzed the two-month CO2 prediction. In each measurement, we have a prediction from a neural network with 10, 50, 100, 150, 200, 250, 300, 350 and 400 neurons. Thus, we completely analyzed the predicted signals of 9 models for each measurement. These signals are compared against the reference based on the evaluation parameters stated above. In terms of the Euclidean distance and MSE, lower values indicate a higher agreement between the signal and reference and thus, better result. Contrarily, a higher correlation coefficient indicates better results. In the following part of the analysis, we report the results of the quantification comparison. All the testing is done for the Wavelet Db6, with 6-level decomposition and the following Wavelet settings: threshold selection rule-Stein's Unbiased Risk and soft thresholding for selection of the detailed coefficients. Figure 24 shows the MSE evaluation for CO 2  As it is obvious, the CO2 prediction contains lots of significant occurrences represented by the glitches and spikes, significantly deteriorating the smoothness of the analyzed signal. Wavelet appears to be a reliable alternative for reduction of those parts of the signal. On the other hand, we are aware that trend detection, in some cases, reduces the peaks and thus, the original signal's amplitude is reduced. Such situations are reported in Figures 21,22 and 23. In the last part of our analysis, the objective comparison is carried out. As we have already stated, we are comparing predicted CO2 signals with signals being filtered out by the wavelet transformation. All the signals are compared against the reference CO2 signals for day and week predictions.
As we have already stated above, in our work we analyzed the two-month CO2 prediction. In each measurement, we have a prediction from a neural network with 10, 50, 100, 150, 200, 250, 300, 350 and 400 neurons. Thus, we completely analyzed the predicted signals of 9 models for each measurement. These signals are compared against the reference based on the evaluation parameters stated above. In terms of the Euclidean distance and MSE, lower values indicate a higher agreement between the signal and reference and thus, better result. Contrarily, a higher correlation coefficient indicates better results. In the following part of the analysis, we report the results of the quantification comparison. All the testing is done for the Wavelet Db6, with 6-level decomposition and the following Wavelet settings: threshold selection rule-Stein's Unbiased Risk and soft thresholding for selection of the detailed coefficients. Figure    Lastly, we summarize the achieved results of all the predicted signals. In Tables 10 and 11, we compare the individual parameters-Mean Square Error (MSE), Correlation index (Corr) and Euclidean distance (ED)-for individual CO2 predictions from May and November 2018. Each parameter is averaged for all the predictions and the difference (diff) between the original prediction and Wavelet smoothing is evaluated (Table 12).   Lastly, we summarize the achieved results of all the predicted signals. In Tables 10 and 11, we compare the individual parameters-Mean Square Error (MSE), Correlation index (Corr) and Euclidean distance (ED)-for individual CO2 predictions from May and November 2018. Each parameter is averaged for all the predictions and the difference (diff) between the original prediction and Wavelet smoothing is evaluated (Table 12).   Lastly, we summarize the achieved results of all the predicted signals. In Tables 10 and 11, we compare the individual parameters-Mean Square Error (MSE), Correlation index (Corr) and Euclidean distance (ED)-for individual CO 2 predictions from May and November 2018. Each parameter is averaged for all the predictions and the difference (diff) between the original prediction and Wavelet smoothing is evaluated (Table 12). As it is obvious, the predicted CO 2 signals contain lots of significant occurrences represented by the glitches and spikes, significantly deteriorating the smoothness of the analyzed signals. Such steep fluctuations may have a significant impact on CO 2 accuracy. Wavelet appears to be a reliable alternative for reduction of those parts of the signal. On the other hand, we are aware that trend detection, in some cases, reduces the peaks and thus, the original signal's amplitude is reduced. In our work, we have studied the Daubechies wavelet family. These wavelets, as it is known, can well reflect the morphological structure of the signals. We are particularly using the Db6 wavelet for trend detection.

Date of Prediction MSE [-] Corr [%] ED [-]
Alternatively, we mention the comparison in Reference [50] of the CO 2 filtration based on the LMS algorithm. In this study, the authors employed adaptive filtration. The main limitation of this method is a necessity of the reference signal and a slow adaptation of the filtration procedure, as well as depending on the accuracy of the step size parameter µ calculation and inaccurate determination of the arrival and departure time of the person from the monitored area. Furthermore, the Wavelet filtration presented in this study achieves better results in a context of the objective comparison against the LMS filtration. Wavelet filtration has a much stronger potential for the CO 2 filtration due to a possibility of the application of a variety wavelets allowing for the extraction of specific morphological signal features in various decomposition levels and, thus, better optimize the CO 2 prediction. These facts predetermine wavelets to be a robust system for the CO 2 prediction enhancement.
In the last part of our analysis, the objective comparison is carried out. As we have already stated, we compared originally measured CO 2 signals with predicted signals being filtered out by the wavelets. To carry out the objective comparison, the following parameters are considered: when considering the MSE, we get better results for wavelet trend detection. This means that we have minimized the difference between the gold standard and filtered signals. The correlation coefficient gives higher values for the predicted CO 2 signals. Regardless, we have achieved just slight differences. The reason might be caused by the fact that the trend detection largely omits higher peaks, therefore, the linear dependence for wavelet filtration is smaller when compared with the predicted signals. Using the Wavelet filtration leads to more accurate results against the predicted signals and signals are much more smoothed, not containing steep fluctuations. On the other hand, we are aware of a certain loss of the amplitude. Therefore, in the future, it would be worth investigating the frequency features of the CO 2 signals to objectively determine frequency modifications while filtering by the wavelets.

Conclusions
The authors of the paper focused on designing a methodology that specifies a procedure for processing data measured by sensors in an SHC environment for the purpose of indirectly monitoring the presence of people in an SHC area through KNX and BACnet technologies commonly applied in building automation. This paper explores the possibilities of improving accuracy in CO2 predictions in SHC using IBM SPSS software tools in the IoT to determine the occupancy times of a monitored SHC room. The RBF method was applied to predict CO 2 levels from the measured indoor and outdoor temperatures and relative humidity. The accuracy of CO 2 predictions from the processed data was compared and evaluated at daily, weekly and monthly intervals for the spring and autumn periods. As it was expected the most accurate results were provided by experiments with the daily intervals (accuracy was about 99%) while the monthly intervals resulted in the least accurate results (accuracy was about 80%). Overall, the developed stream in IBM SPSS Modeler is capable of predicting the CO 2 concentration values using the values of humidity and indoor and outdoor temperature. By providing a live data asset to the IBM Cloud, the uploaded model can achieve near real-time prediction of CO 2 concentration values. Using a wavelet transform mathematical method to cancel additive noise led to more accurate results in predicted signals. The signals were also much smoother and did not contain sharp fluctuations, although there was a certain loss in amplitude, resulting in inaccuracies when the maximum achieved CO 2 value was determined. Future work should, therefore, focus on finding an optimal method for canceling additive noise in real time, which would help increase the overall accuracy of CO 2 predictions [57][58][59][60]. Additionally, the real-life and live performance of this implementation should be examined.