Towards Characterization of Indoor Environment in Smart Buildings: Modelling PMV Index Using Neural Network with One Hidden Layer

: Modelling of comfort with the use of neural networks in modern times has become extremely popular. In recent years, scientists have been using these methods because of their satisfactory accuracy. The article proposes a method of modelling feedforward neural networks, thanks to which it is possible to obtain the most e ﬃ cient network with one hidden layer in terms of a given quality criterion. The article also presents the methodology for modelling a PMV index, on the basis of which it can be demonstrated whether the network will work properly not only on paper but in reality as well. The objective of this work is to develop a performance model allowing the e ﬀ ective improvement of all electrical and mechanical devices a ﬀ ecting the energy e ﬃ ciency and indoor environment in smart buildings. To achieve this, several attributes of indoor environment are included, namely: air leakage as a connection to the outdoor environment, but also as uncontrolled component of energy, ventilation as delivery and distribution of fresh air in the building space, individual ventilation on demand indoor air quality (IAQ) in the dwelling or as a personal IAQ control, source control of pollutants in the building, thermal comfort, temperature, air movement and humidity control (humidity modiﬁers, i.e., bu ﬀ ers di ﬀ erent from the air conditioning radiation from cold and hot surfaces bringing forward a question about the strategy of the process control. One may either develop a series of control models to be synthesized later or one can use one over-arching characteristic and use its components for operating the control system. The paper addresses the second strategy and uses the concept of PMV for a criterion of broadly deﬁned thermal comfort (including ventilation and air quality).


Introduction
A team of collaborating authors in the US and Poland address the issue of energy efficiency of modern buildings. The scope of work includes preheating of ventilation air [1][2][3][4], multi-criterial analysis [5,6], the development of an integrated approach, called Environmental Quality Management [7][8][9] in the context of building physics [10,11], fuzzy logic [12] or artificial neural networks [13,14]. The greater focus on neural networks (NNs) is justified by the observation that currently used energy and hygrothermal models are parametric and may not be suitable for application in real-time building of automatic controls [14,15]. This statement is even more justified when considering their application to thermal comfort evaluation.
The evaluation of thermal comfort is so important because the indoor environments have become humans' dominant habitat. The research [16] proves that more than 90% of our time is spent indoors. The productivity, health and well-being of building occupants depend on four indoor environmental quality aspects (IEQ) [17]: acoustics, indoor air quality, visual comfort [18,19], thermal comfort.  (5)) of feedforward type with one hidden layer. Results obtained for the case without learning data selection.

The Structure of the Paper
The article consists of five sections. In Section 1, the general research issues and the novelty and purpose of the work are described. Section 2 includes the research methodology and the data used. In Section 3, the results of the research that allowed to identify the best NN were presented according to the methodology from Section 2.
In Section 4, the obtained results are discussed and compared with those obtained in other studies. The implications of the study results were also discussed.
In Section 5, the six most important conclusions of the presented research are included, along with a description of the future research program.

Personalized Thermal Comfort Models
Predicted mean vote (PMV) [43] and the comfort zone as defined by the American Society of Heating, Refrigerating and Air Conditioning Engineers (ASHRAE55) [44] are the most popular methods of assessing thermal comfort. The first of these was presented by Fanger, P.O. [44] and then incorporated in the ISO7730 standard [22].
In addition, at the moment it is worth highlighting two main standards of adaptive models. The first is the already mentioned ASHRAE55 adaptive model [45], and the second is the European Sustainability 2020, 12, 6749 4 of 38 Standards 15,251 (EN15251) adaptive model [46]. It should be noted that the above-mentioned models are used quite widely in international standards, however, the need to improve their forecasting efficiency is noticeable [21], especially in the case of individual comfort [47,48]. This is because these models are not very flexible in terms of the comfort characteristics of individual people and cannot be updated according to them [21]. In addition, it is worth mentioning that they cannot currently be reformulated [49]. An important reason for this is that the main two models [45,46] are designed to predict the average level of comfort for a particular large representative group. As a consequence, they contradict the analysis of comfort issues taking into account the individual aspects of the user living in a given area and working in a specific environment. Instead, they aim at showing some significant generalizations.

The Present Research Gap, Literature Review
There is a wide variety of conducted studies on thermal comfort and building energy consumption with the use of artificial intelligence (AI). For example, in [50] an ANN and EnergyPlus were used to collect data of the Administration Building of Sao Paulo University to predict building energy consumption. In [51], an artificial neural network (ANN) was designed to simulate energy consumption according to different exterior wall materials. AI to predict building energy consumption is also presented in [52][53][54][55]. In [56], an ANN energy consumption prediction model for HVAC systems in office buildings was developed, but the model has fewer input parameters. In fact, a general description of how AI techniques may help for energy efficient HVAC system design was described in [57]. From the point of view of the use of AI for individual thermal comfort, it may be beneficial to familiarize oneself with [38,58,59], where neural networks are used to evaluate the issue and regulation in a building. In addition, AI techniques are currently associated with IoT [60] in the thermal comfort controlling system for buildings. A fully comprehensive literature review for artificial intelligence for efficient thermal comfort systems from April 2020 was conducted in [61].
In light of the above, there is a gap in comfort studies (described in more detail in [22]), which consists of the lack of flexibility of the models describing it. Therefore, there is a need to propose a comfort modelling method that will be adapted to specific data received in a specific region of the world for a specific group of people. Thanks to this, one can talk about a kind of individual approach to the topic, while taking into account a set of features representing a given community. The described gap was noticed some time ago, thanks to which there is currently a flood of publications on this subject [1][2][3]5,7,[10][11][12]22,[27][28][29][30][31][32][33][34][35][45][46][47]. It is worth quoting the work in [22], in which it is stated that the establishment of an individual thermal comfort evaluation is essential to achieving personalized thermal environment management.
Thanks to scientists from all over the world, the issue of comfort has been gradually developing. It may be interesting that the researchers are searching for solutions on the topic described above by using machine learning and deep learning [12,22,26,30,[38][39][40]. In most cases, this trend is based on PMV index modelling with the use of feedforward neural networks.
For example, in one of the latest publications of this type [22] (2019), a system based on the Building Information Model (BIM) and artificial neural networks (ANN) is used to improve energy saving efficiency under the premise of increasing human comfort. This system consists of, among others, an ANN predictive model considering the PMV index.
Another position of this type is [26], where the authors use "Artificial Neural Network (ANN) due to its ability to approximate any nonlinear mapping." In this position, the authors model the PMV index and state that "using ANN to train, we can get the input-output mapping of HVAC control system ( . . . ); we can propose a practical approach to identify thermal comfort of a building" [26].
Depending on the research team dealing with the described subject, neural networks of different levels of complexity are proposed. For example, in [38] a neural network (NN) thermal comfort evaluation model is proposed with only four environment variables as the input values. In this work, the model was based on the backward propagation algorithm, which ignores the differences Sustainability 2020, 12, 6749 5 of 38 of individual thermal sensation. On the other hand, [41] presents the use of six different algorithms, including those correlated with NN (CTree, GPC, GBM, kSVM, RF, and regLR) used to develop personal comfort models. This position already includes environment data and the behavior of users of the Personal Comfort System (PCS) act as input variables. It is also worth mentioning that the authors of this work use boxplots to present certain features such as prediction accuracy or variable combinations, which is a very good solution.
The use of boxplots as a tool for visualizing and assessing the accuracy of calculations correlated with comfort modelling using ANN can also be seen in [40]. The author of the publication proposes a complex ANN model for predicting thermal comfort, taking into account three variables of current climatic conditions, four indoor environmental variables, and two individual variables, building types, as well as a body variable. The essence of this paper is the fact that it demonstrates the high potential of using ANN in evaluating individual thermal comfort.
It is important to note that so far a quite popular network architecture in terms of applicability to the described issue has been the classic one [42], consisting of at most one hidden layer and one output layer [22,40,41]. However, recently, an approach using deep learning with different levels of network structure complexity has been used [26,30,38]. In view of the above, it is worth looking at these works, knowing at the same time that it is time to propose some tools that will help for better modelling of already used structures.

Classic PMV Thermal Comfort Evaluation Model
Studies on thermal comfort carried out over the last fifty years or more show that many factors influence it. Although the knowledge about these factors is currently recognized to a satisfactory degree [62][63][64][65], the formulation of a mathematical model involving all of them is an issue that has not yet been solved.
The first studies conducted on such a model have already yielded reliable results. These studies were conducted by Fanger, P.O. [43], who proposed an equation to estimate the average vote of a large group of persons on the thermal sensation scale. In this equation, Fanger proposed that the PMV index be described using six factors: air temperature, air velocity, clothing insulation, humidity, mean radiant temperature and metabolic rate.

Deep Learning or Classic Network Structure
In the world literature, there is a noticeable trend, in which along with modelling of the comfort index with the use of NN, a special case of a neural network with a specific structure [22,26,30,[38][39][40]42] is presented. Usually, the authors of these works propose a network that functions properly under certain physical conditions. However, these conditions may not be met for another social group or building type. Therefore, the proposed networks are suitable for a fairly narrow group of cases in which they work with satisfactory performance. The important thing here is that the authors of the works dealing with the topic of comfort modelling use various learning techniques and methods to achieve the intended goal. Two dominant paths can be distinguished: the first using classic NN structures with one hidden layer [22,40,41] and the other using deep learning [26,30,38].
At this point, however, the question arises: why and in which case is it worth using the classic structure of a neural network with one hidden layer, and when to model the issue using deep learning methods? The answer to this question is quite clear and it results from the very functioning of neural networks.
While using classic neural networks with one hidden layer, one usually employs mapping methods that find the relationship between the network's input arguments ("inputs") and its reference output value ("target"). In this case, the network learning process is designed to find, for example, a function mapping that will best combine the mentioned input arguments with the output value.
This means that if the authors use neural networks with one hidden layer, then it is necessary for them to prepare and properly process the data. In this case, it is the author of the network model's Sustainability 2020, 12, 6749 6 of 38 responsibility to properly select the learning data. During such selection, the author of the network should independently choose the learning data so that they fully characterize the most important features of the studied social group, taking into account the nature of the building and other conditions. In order for authors to do this, they are expected to have expert knowledge, without which it becomes practically impossible. Lack of this knowledge means that despite the application of network structure optimization, the models will be burdened with unacceptable errors, which is shown in Figure 1. In addition, poorly selected data in this case distort the process of mapping the phenomenon.
In view of the above, if the authors of the network use a classic structure with one hidden layer, then the correctness of the modelled features depends on their expert knowledge, since such a structure cannot identify them.
Unlike classic neural networks with one hidden layer, deep learning methods introduce more hidden layers to the network structure. The consequence of this is the extension of some kind of network awareness of learning data. This is due to the introduction of additional hidden layers that allow identification of common features occurring between the input arguments of the network. This identification occurs in the network learning process. Thanks to these identified features, the initial value is estimated.
The main difference between classic feedforward networks with one hidden layer and networks using deep learning is that the latter are able to identify the characteristics of the data assigned to the learning process, while the former cannot. In addition, due to the greater complexity of the networks' structure and their expanded capabilities, when using deep learning methods, much greater expert knowledge in the field of modelling is required from the network designer. During the modelling process, the designer focuses on choosing structure, teaching technique and network validation methods so that the characteristics of the learning data are detected in the best possible way. Unlike the classic structures of neural networks with one hidden layer, using "deep" structures does not require expert knowledge related to the modelled object. This does not mean, however, that the knowledge is superfluous, because some awareness of phenomena correlated to the modelled object is necessary when choosing the number of hidden layers and the number of neurons in a given layer. This selection is extremely important, as it is responsible for the number of identified features and the quality of modelling.
In conclusion, in order to use feedforward neural networks with one hidden layer, expert knowledge associated with the modelled object and correlated phenomena is obligatory. It is used to properly select data so that they characterize the features of the object. On the other hand, while using "deep" networks, this expert knowledge is not required. Nevertheless, certain awareness of the functioning of the modelled object is necessary. This is due to the ability to identify the features of the object by using more hidden layers. Therefore, expert knowledge in this case is shifted towards issues related to neural networks, and not the modelled object itself or the phenomena correlated with it.

Data Processing, Network Structure, General Equation, Structure Identification Method
Proper preparation of training data is crucial for modelling using a neural network with one hidden layer. Due to their diverse specificity, this chapter will not describe how to select them. This is because each case for modelling the PMV index for a specific group of people can have completely different features. As already mentioned, the identification of these features requires expert knowledge, preferably combined with knowledge in the field of feature engineering [40].
However, there are some data processing techniques that, as confirmed by studies [66][67][68][69][70][71], increase the mapping quality of feedforward neural network with one hidden layer. These techniques include normalization methods, for example the "mapminmax" method that has satisfactory convergence necessary for the PMV index modelling. It should be noted that the use of the "mapminmax" method is not mandatory and that the effectiveness of normalization depends on the specific case of selected learning data. Data normalization in artificial intelligence methods is mainly used to process data so that the influence of the value of the input arguments matrix of the X network is at the same maximum level [40]. Thanks to this procedure, the individual arguments of the input matrix x i can be treated as equivalent in terms of their impact on the output value of the network y. The use of such a technique involves the pre-and postprocessing of data, because if the data are transformed into another form, after being processed by the network, they must be reduced to values corresponding to the originals. The introduction of pre-and postprocessing is usually a deliberate procedure, because, as research shows [66][67][68]72], it allows obtaining better convergence of the results of the network (outputs) with the data obtained from the measurements (targets). However, one should bear in mind that it depends on the choice of activation function. If a sigmoid function is chosen as the activation function [42], then it is recommended to introduce normalization and denormalization of data. On the other hand, in case of selecting the ReLU function as the activation function, this is not required. It should be noted that the ReLU function is commonly used in "deep" networks rather than in those with one hidden layer described in the article.
Therefore, for modelling the PMV index, the author recommends using the "mapminmax" normalization method. The research results presented below take into account the use of this method.
The mapminmax function is a linear transformation into the interval of given boundaries [70]: where Val org -original value, Val -transformed value, Val max and Val min -original interval boundaries, Val max , Val min -desired range boundaries, here from −1 to 1.

The Network Structure and its General Equation
The structure of the feedforward neural network with one hidden layer is shown in Figure 2. In this figure, the blocks of pre-and postprocessing data are shown in green. In turn, the particular layers of the network were drawn against a yellow background. These layers were marked according to the recommendations in item [73]. Namely, the hidden layer is marked with the index {1}, while the output layer is marked with the index {2}. Activation functions were marked with the "activ" subscript. The network weights are designated as W neur_id , where neur_id represents designation of a neuron. A similar procedure was used for bias b neur_id .  The structure from Figure 2 represents Equation (2) in the general form: where: norm mapminmax -data preprocessing operation, denorm mapminmax -data postprocessing operation, X-input data vector, y NN -output value of the NN.
The matrices and vectors of Equation (2) are as follows:

The Method Identifying Structures with the Best Number of Neurons in a Hidden Layer and the Best Neural Network
The complexity of the structure of a neural network with one hidden layer in the analyzed case depends on the number of neurons in the hidden layer. The parameter that characterizes this complexity is the value of s {1} . It determines the size of the matrices W {1} , B {1} and W {2} and it characterizes the modelled phenomenon together with the data assigned to the learning process.
The essence of proper selection of s {1} is that its value is neither too small nor too high. It turns out that incorrect selection of the value of s {1} leads to undesirable phenomena such as overfitting or underfitting [70], and others [42].
Therefore, it is recommended to choose the number s {1} so that it meets a certain assumed criterion of the quality of the mapping. For this purpose, the author recommends the use of the algorithm shown in Figure 3. After completing the procedure presented in this figure, it is necessary to assess whether the selected structure is characterized by repeatability of the obtained results and whether the impact of overfitting or underfitting is negligible. An example of such assessment will be presented in further sections of the article. In the algorithm, the range of s {1} from 1 to 50 was selected due to the specificity of the data for NN. This range was chosen in order to show that above a certain number of neurons in the hidden layer, the network is not suitable for use.
The algorithm shown in Figure 3 aims to identify the best possible complexity of the neural network structure in terms of the chosen evaluation criterion. This algorithm also enables indication of the best network obtained for this criterion. It consists of two procedures: P1 and P2. It consists, in crude terms, in training all possible structures of the feedforward two-layer neural network in sequence, performing their validation and testing the correctness of their functioning. The aforementioned identification of the best possible dimension of the matrix was associated with checking fifty different network structures differing in number of neurons s {1} in the hidden layer. In the algorithm, this number increases by 1 in the range of 1 to 50. Due to the fact that in the first iteration of the network learning process, the initial values of weights and bias in neurons are assigned randomly, calculations for each of the examined structures with a specific number of neurons in the hidden layer are repeated several times. In the algorithm, each such calculation is indexed as approach and numbered consecutively from 1 to 10. The described repetitions of the learning process of the same structures indexed approach are necessary, because thanks to them the probability of obtaining a minimum global performance function increases [66,74]. On the other hand, lack of such a procedure may lead to stopping of the learning process in the place of a minimum local performance function occurrence, which would lead to erroneous conclusions regarding the usefulness of a particular network structure. Repeating the complete learning process several times, including network training, validation and testing, has another advantage. This procedure enables the study of robustness of the considered neural network structures (Sections 2.6.1 and 3.1). The author recommends that this test be carried out on the basis of specially created boxplots [75], as shown in the case of [39] and as it will be shown in Section 3.1.
As already indicated, the described algorithm identifies the best possible structure in terms of a certain quality criterion. This identification takes place in the block: Are the currently obtained results better than "The best results so far"?
This criterion should be selected depending on what is expected from the designed neural network. Usually, expectations come down to the choice between two criteria: 1. When the network designer wants to achieve the best possible quality of function mapping to the data assigned to the learning process and to independent data that may occur when the network is used. In this case, it is recommended to select the criterion of the minimum value of Maximum Absolute Relative Error [76] calculated for the network testing stage (2) obtained for the given approach (Equation (5)): where: MainCrit-main criterion for choosing the best neural network structure, s {1} -number of neurons in the hidden layer, MaxARE TEST -Maximum Absolute Relative Error obtained for the testing stage: where: y iTest -target for the network in testing stage, y NNiTest -output for the network in testing stage.
2. When the network designer cares mostly about the speed of the network, and then about the quality of its function mapping to the data assigned to the learning process and independent data, this may occur when the network is used. In this case, the decisive factor is the minimum network complexity that ensures satisfactory results in terms of compliance of the network results (outputs) with its target results (targets). This factor is dominant because the speed of network operation depends mainly on three conditions: computing performance of the calculating machine, precision of significant numbers depending on the data format, and the computational complexity depending on the size of the network structure. Due to the fact that the network designer usually has no influence on the first two conditions, the speed of the network is determined by the size of the hidden layer. The requirements given in point 2 can be met if one chooses the following identification criterion: where: MainCrit-main criterion for choosing the best neural network structure, s {1} min -the smallest number of neurons in the hidden layer, MaxARE TEST -maximum absolute relative error obtained for the testing stage. The criteria listed above are examples and may change depending on the modelled phenomenon. The decision to propose a minimum value of Maximum Absolute Relative Error (2) results from the specifics of this indicator. The indicator for the absolute value of the network output ( y NNi |) guarantees that each of them will be in the range of Absolute Relative Error from zero to maximum absolute Relative Error [77,78]. Consequently, it gives information about the absolute value of the maximum possible error of the mathematical model [79][80][81] represented by the already taught neural network. This criterion only includes data for the testing stage. This is a deliberate procedure, because if the taught neural network commits a Minimum Relative Error in a value for data that have never been taken into account at the stage of network training and its validation, then its results should be the least different from the actual values. It should also be noted that the results obtained from the testing stage assess the likelihood of occurrence of the phenomenon of "overfitting", which becomes smaller the smaller the difference in network response (y ANNiTest ) relative to the target values of y iTest .
Analyzing the criteria (14) and (16), one can see that in the mathematical description the main difference between them is the replacement of s {1} by s {1} min . This procedure has significant effects in the form of transferring the main emphasis of calculations from aiming at achieving the best quality function mapping to the speed of the neural network.
It is worth noting that meeting the criteria (14) or (16) initially identifies the best network structure. However, the final assessment of the applicability of the network structure should only take into account those structures that: I. show robustness to changes of initial values of weights and bias in network neurons, II. are characterized by negligible impact of overfitting or underfitting.
Condition I concerns the network's robustness to the initial values of weights and bias and it is introduced because of the desire to indicate the features of the mathematical model of the neural network that ensure the highest possible repeatability of the results of the presented research issue [14,78].
Condition II shows whether the network has memorized the data assigned to the learning process or whether it is not too simple.
The check of both these conditions will be presented later in the article with the participation of measurement data. Here, however, it should be emphasized that from a mathematical point of view, meeting the criteria (14) or (16) is a necessary condition. However, the fulfilment of Condition I and II is a necessary and sufficient condition leading to full network applicability.

Data for NN and Chosen Learning Specification
This subsection presents a description of chosen examples of learning data and the selection of parameters characterizing the size and structure of feedforward networks with one hidden layer. The subsection is the final part of the paper that describes the theoretical stage of PMV index modelling and begins the practical stage, in which the assessment of modelling quality and the possibility of using the taught neural network is presented. The calculations were made using the MATLAB environment version 9.8.0.1417392 (R2020a), update 4, license number: 214849, with Deep Learning Toolbox, Version 14.0, (R2020a).

Data for NN
Measurement data titled "Langevin Data Legend" by Jared Langevin (jared.langevin@lbl.gov) from Drexel University, Department of Civil, Architectural, Environmental Engineering was used to model the PMV index. The source of the data was [82]. The choice of data for the example was related to the fact that on their basis a case study on tracking human-building interaction described in [23] had already been prepared. The research object was a medium-size office with an area of approximately 58,000 square feet with a variable air volume (VAV) system, operable windows and adjustable thermostats, located in Philadelphia, PA (Center City). The building was described in detail in [23], which also contains a description of the measurements, their results and analyses. In [23], there are also graphs and histograms describing data distributions and their specificity. The data used for the NN was collected in the period between July 2012-July 2013. These data were registered at fifteen-minute intervals on local thermal conditions, related behaviors, and comfort of twenty-four occupants. The summary of the data was created on 20 July 2015 and contains a total of 840,984 measurement samples along with the results of their analyses. The data cover a total of seven categories of results: personal characteristics, d. comfort/productivity/satisfaction, e.
personal values, g. model.
The total number of parameters included in the aforementioned categories is 118, and their detailed descriptions can be found in [82]. From the 118 parameters mentioned above, those representative for PMV index modelling were selected. The choice of these parameters was associated with the specificity of data and the scientific experience of various researchers, which were described in [22,28,38]. Finally, for the needs of modelling, 20 parameters were selected, which are arguments for the PMV index model for the said building. These arguments were written in the X matrix (Equation (4)). The elements of this matrix, along with a description of the parameters, are given in Table 1. The data described also contained the PMV index for this building [23,82]. The values of this index were the reference output values of the y i network ( Table 2). The histogram of all input data and reference output values (targets) assigned to the learning process with the division into learning stages is shown in Appendix A- Figures A1 and A2.
In order to make an exemplary PMV index model from the data from [82], a vector of the first 30,000 sets of samples was taken. Then, it was checked that all the elements listed in Tables 1 and 2 in these sets are complete and are different than "not a number" (NaN). Thanks to this, a vector of sample sets with a length of 11,309 was obtained, in which each element was complete and was not NaN. Afterwards, from the said vector of 11,309 sample sets, identification of samples generating possible model noises was performed. The number of such sets of samples equalled 663. As a result, a vector with a length of 10,646 sets of samples was selected to model the PMV index. This vector is divided into two matrices: • matrix X (inputs) with dimensions of 10,646 × 20, which is a series of input sets of samples assigned to the network learning process, • the matrix Y (targets) with dimensions of 10,646 × 1, which is a series of PMV index reference output values obtained from the data included in the matrix X.
Matrices X (inputs) and Y (targets) were used to teach 50 neural network structures (500 NNs) according to the algorithm shown in Figure 3. However, in order to do this, the elements of the sample sets (X i , y i ) were divided into 3 parts (training, validation, tests). The first part, with 60% of the data sets, was assigned to the network training stage (X iTr , y iTr ). The second and third part, with 20% of data sets each, were assigned respectively to the validation stage (X iVal , y iVal ) and network tests stage (X iTest , y iTest ). Examplary effects of assigning the data sets to a specific learning stage are shown in Figure 4.
The algorithm for assigning data sets was saved as a loop that resets the iteration step after the assignment of the data set to the testing stage. This algorithm assigned data sets as follows: the first three data sets from a series of sets were assigned to the network training stage, the fourth set was assigned to the validation stage, the fifth set was assigned to the testing stage. Then, the abovementioned loop iteration reset took place. The process of assigning data to a specific learning stage was completed after assigning all the prepared 10,646 data sets.
Data prepared in this way (thanks to the invariable and manual assignment of data sets to specific learning stages) enabled limiting the randomness of neural network results for a given approach.

Chosen Learning Specification
The specification presented in this chapter should be treated as selected for the data described in the previous chapter. The choice of such specification depends on the nature of the data assigned to the learning process. The author encourages specialists who model the PMV index using the method described in this article to select learning parameters based on the knowledge of the data assigned to the learning process. The learning parameters and activation functions presented below are characterized by satisfactory convergence of network results with reference data for the tested object. These parameters were selected on the basis of experiments carried out by the author. The chosen results on the basis of which the parameters were selected are presented in Appendix B. This selection can be made using the optimization of hyperparameters, e.g., Random Search, Grid Search, Bayesian optimization. The selected specification of learning and the networks are presented below in a way which enables further description of the selection and evaluation of the network for the method described in this article.
The learning parameter values given for the network learning process are presented in Table 3. The research was carried out using Levenberg-Marquardt training algorithm [83]. This algorithm was proposed because it has shown satisfactory performance in preliminary calculations. As a performance function the mean squared error (MSE) was chosen (8): where: n-number of sets for each learning stage: training (X iTr , y iTr ), validation (X iVal , y iVal ), tests (X iTest , y iTest ), y i -target for the network-reference PMV index value, y NNi -output of the network respective to the i-th target, which corresponds to estimated PMV index value.
Implicitly, the Error was also defined Equation (9) as: As an activation function in the hidden layer, according to [42,66,74], a hyperbolic tangent sigmoid function (Equation (10)) was implemented. In the hidden layer, according to the recommendations from [74], a linear activation function-purelin (Equation (11)) was implemented. The selection of the activation function was made for the recommendations of the non-linear function analysis [66] because this type of phenomena include the modelled issue.
The hyperbolic tangent sigmoid function was chosen because its output values, unlike the Sigmoid and ReLU functions, are in the range of positive and negative values. As a result, smaller network complexity in the hidden layer can be expected [42,66,74]. In addition, this function allows both continuous and discrete changes of input data to be taken into account [70], which is done in the present case. What is more, thanks to the choice of the linear activation function-purelin in the output layer, it is possible to evaluate the quality of the model fit (coefficient of determination R 2 ), as well as regression analysis with the Pearson coefficient (R) [74].

Robustness Study Methodology
According to the algorithm described in Section 2.4.3, the research involved checking 50 structures of neural networks differing in the number of neurons in the hidden layer. Each of these structures was trained, validated and tested ten times. Each such process is called approach.
The repetition of the examination of a given network structure ten times is caused by the deliberate assignment of initial values of weights and bias at the training stage in a random manner because such a procedure allows robustness analysis of the network structure. This procedure is particularly important from the point of view of work stability and repeatability of the use of the neural network (see Condition I, Section 2.4.3). This is due to the fact that if a given network structure changes the initial conditions in weights and bias values, it shows similar results of network quality indicators, e.g., MARE (Mean Absolute Relative Error), R (Pearson's coefficient) [83] and others; then it can be stated that such a structure is insensitive or not very sensitive to the initial conditions. This study is similar to a system stability examination, which, in crude terms, is to check whether a change in the initial state of its work has an impact on its results. Therefore, if the network structure shows a satisfactory insensitivity to changes in initial values of weights and bias, then it can be considered stable in terms of the studied issue [70]. A similar situation occurs in the case of repeatability of the use of the neural network structure, because if it shows satisfactory insensitivity, and its quality indicators (e.g., MARE, R, MSE or other) along with its results are satisfactory in terms of the correctness of the whole system operation, then this repeatability occurs.
The decisive factor in choosing indicators is usually the order of magnitude of reference output quantities assigned to the learning process (targets). However, there are sometimes cases [86] in which, in addition to the order of magnitude, the nature and variability of targets are taken into account.
In general, the proposed rule of choosing indicators is to check the order of magnitude of targets and to use the following conditional function: where: y i max -maximum absolute target value obtained for all data assigned to the learning process. This principle is based on the following logic of choice: If the maximum absolute target value y i max is: • less than 1, then one should use the indicator that contains the Sum of Absolute Errors made by the network; • less than 10, then one should select the indicator that contains the average of the Sum of Absolute Errors made by the network. In this case, it is recommended to check how the structure behaves in terms of the selected indicator from the case y i max ≤ 1; • greater than 10, then one needs to choose an indicator that amplifies network Errors by using the exponentiation operation.
It is worth noting that in Equation (17) MaxARE is considered in each case. This is because this indicator informs whether a given structure is characterized by local minima for data assigned to the learning process.
Therefore, since the absolute values of the PMV index are less than or equal to 3, the case of y i max ≤ 10 should be used for robustness analysis. The results of this analysis for data from Section 2.5.1. are presented in Section 3.1.

Methodology of Overfitting and Underfitting Study
After performing the robustness analysis, one should conduct a compliance study of function mapping for data that were not considered in the network training or validation process. This study is carried out for the testing stage and it is aimed at checking whether a specific structure of the neural network, as well as a given approach, models the phenomenon in accordance with reality, or whether overfitting or underfitting take place [42]. The occurrence of overfitting means that the network memorizes data, thus losing its ability to correctly interpret the data, except for those assigned to the network training stage. In turn, the occurrence of the underfitting phenomenon informs that the network model is too simplified for the analyzed issue.
To verify whether there is overfitting or underfitting for a given structure or a specific approach, just like in the case of robustness study, one should check the order of magnitude of y i (targets). Then, based on this verification, indicators informing about the occurrence of these phenomena are selected. Generally, the proposed rule for selecting indicators is analogous to the robustness study (Equation (18)). Essentially, it differs only in the fact that MaxARE does not work in overfitting and underfitting analysis, and that these indicators are calculated with the use of data that are not involved in the training or network validation stages.
Therefore, since the absolute values of the PMV index are less than or equal to 3, the case y i max ≤ 10 should be used for overfitting and underfitting. The results of this analysis for the data from Section 2.5.1 are presented in Section 3.2.

Results
The results shown in this chapter include checking 500 neural networks. The purpose of this check was to find the best network structure with one hidden layer and to select the best network for modelling the PMV index. This goal was achieved for Criterion 1 (Equation (5)). An identical analysis can be performed for Criterion 2 (Equation (7)). This chapter presents the complete network assessment procedure for its applicability (see Section 2.4.3, Conditions I and II and Section 2.6). In Section 3.1, the assessment of Condition I is described, while the assessment of Condition II is covered by Section 3.2.  In Figure 5, it can be seen that the structures with s {1} ≤ 3 or s {1} ≥ 37 are characterized by their sensitivity to changes in the initial values of weights and bias, so they will not be suitable for use.

Robustness Study of the Examined Neural Network Structures
Due to the high (compared to other) values of robustness assessment indicators obtained for structures with s {1} ≤ 3 and s {1} ≥ 37, these structures will be omitted in further analysis in order to present the results in a clearer manner. Figures 6 and 7 present the results of the SAE and MAE indicators, respectively. In these figures, similarly as before, the number s {1} was drawn on the abscissa axis.  Based on the results obtained from the boxplot (Figures 6 and 7), it can be stated that the structures for s {1} = 5, 10, 11, 16, 19, 21 − 23, 26, 36 show a satisfactory insensitivity to the influence of initial conditions assigned during the learning process. As can be seen, the indicated structures are characterized by a narrow interquartile range (IQR) [87,88] compared to other IQR ranges presented in the figures. The remaining structures are sensitive to initial conditions and have too-wide IQR [87,88] or too-large discrepancies in the results with respect to the median [87,88]. It is especially visible for the structures with s {1} = 4, 6, 7, 8, 9, 15, 17, 18, 24, 27,   Structures with s {1} ≤ 3 are not drawn in this figure due to the significant impact of the underfitting phenomenon for these structures. This means that these structures cannot be used for the analyzed data. Therefore, the analysis of the issue in question will be considered without these structures to allow for a clearer demonstration of results. As proof of this fact, in Appendix C there is a boxplot ( Figure A5) illustrating the SAE results (22) obtained for the neural networks testing stage for the range 1 ≥ s {1} ≤ 50.

Study of Overfitting and Underfitting of the Examined Neural Network Structures
Based on the results presented in Figure 8, it can be seen that the structures with s {1} ≥ 41 have a much wider IQR [87,88] than other IQR ranges. This means that the influence of the overfitting phenomenon for the above-mentioned structures is significant, so they cannot be used. The following figures for the present study were drawn without the involvement of these structures, in order to improve the visibility of the results. Figures 9 and 10 present, respectively, the SAE (Equation (13)) and MAE (Equation (12)) results obtained for the neural network testing stage without taking into account the structures with s {1} ≤ 3 and s {1} ≥ 41.
A much broader IQR can be observed for the indicated structures [87,88] than other IQR ranges. There are also large discrepancies between the maximum and minimum values of SAE and MAE, exceeding the first and third quartile. It means that in such a case we are dealing with the phenomenon of overfitting. If the aforementioned maximum and minimum values were not so differentiated, it would characterize the phenomenon of underfitting.   In conclusion, it is stated that structures characterized by the impact of overfitting and underfitting at a satisfactory level [87] are the ones with s {1} = 5, 10 − 14, 16, 19 − 23, 25, 26, 28, 31, 32, 35, 36. These structures will be taken into account in the identification of the best network structure according to the criterion described by Equation (5).

Identification of the Best Network Structure and the Best
The results obtained for robustness study (Section 3.1) as well as overfitting and underfitting study (Section 3.2) lead to the conclusion that, among the examined structures, the ones with s {1} = 5, 10, 11, 16, 19, 21 − 23, 26, 36 and their approaches with sufficient accuracy (Section 3.2) and stability (Section 3.1) model the PMV index.
This statement results from the conjunction of two points: In view of the above, the identification of the best possible network structure and the best approach for the selection criterion described by Equation (5) was made for structures with s {1} = 5, 10, 11, 16, 19, 21 − 23, 26, 36. Figure 11 presents a boxplot of MaxARE TEST results (Equation (5)) obtained for the selected structures. It shows that the smallest values MaxARE TEST were obtained for the structure with s {1} = 5, therefore this structure was indicated as the best for the analyzed selection criterion. This result was also confirmed by its comparison with the results of the other networks in Figure 11.  Table 4 shows the detailed values of MaxARE TEST obtained for structures with s {1} = 5. The best approach identified that meets the main criterion for selecting the best structure and network (Equation (5)) is marked in bold in the table. The values in Table 4 show that the network meeting the main selection criterion (Equation (5)) for the data assigned to the test stage in the worst case commits a Relative Error smaller than or equal to 1.8%. The result obtained thanks to the procedure presented in the article can be considered very good. However, one should bear in mind that the received MaxARE TEST [%] ≤ 1.8 % does not apply to the full range of data assigned to the learning process. Therefore, when modelling the PMV index, one should check the results for the full range of data, as shown in the next section. The learning process of the identified best neural network (Equation (5)) is presented in Figures 12 and 13. These figures show in sequence the calculated values of the network performance function for individual learning stages (Equation (8)) and the network learning parameters. They were drawn for the network learning epochs.  From Figure 12 one can deduce that the neural network obtained the best results for the 356th learning epoch. Therefore, the results obtained for this epoch are the end result of the learning process of the network. It can be seen in the figure that no significant improvement in the value of the performance function MSE (Equation (8)) is observed for the network training stage above the 150th epoch. However, it is noticeable in this scope for the stage of network validation and testing. This case suggests that overfitting or underfitting does not occur or is negligible.
The "validation checks" chart in Figure 13 shows that the learning process was carried out without frequent occurrences of increases in the value of the network performance function MSE for the validation stage. As a result, usually a zero value is seen on the chart. The occurrence of non-zero values with an upward trend in this chart means that the data assigned to the learning process are characterized in the case of the analyzed structure by the local minima of network performance functions. Apparently, this is not the case for the best structure identified. This case is, among others, a consequence of the robustness test (Section 3.1).
In Figure 13, it can be seen that during the learning process, the Gradient from around epoch 50 was of the order of magnitude 10 −5 . This means that slight changes in weights and bias during the training stage improved the learning process for the validation stage. This phenomenon can be observed in Figure 12 as well. Figure 13 also shows changes in momentum (Mu) for each successive learning epoch. The figure shows that the value of momentum decreased with the increase of the learning epochs' numbers, which means that the learning process was carried out correctly [66]. Figure 14 presents a histogram of Errors made by the network e i (Equation (9)), calculated for data assigned to a particular stage of the network learning process (training, validation, test). In this figure on the abscissa axis, the values of network Errors e i assigned for a given bin were presented. The ordinate axis indicates the number of instances e i covering the range of a given bin. This figure was created in accordance with the guidelines given in [74]. In the discussed figure, it can be seen that the data were assigned to a specific learning stage in a regular manner (Section 2.5.1). The figure shows that the vast majority of the data were characterized by Error around 0.0015. From the point of view of the quality of the modelled phenomenon, this figure is purely illustrative since the values of e i (Errors) are not related to y i (targets). Therefore, Figure 15 presents a histogram of Relative Errors RE calculated in accordance with Equation (19). The ordinate axis indicates the number of RE occurrences covering the range of a given bin. Figure 15 was drawn for 40 bins. Looking at Figures 14 and 15, one can state that both are characterized by the Gaussian distribution. In the first case, this distribution is right-sided, while in the second case it is left-sided. This situation may occur if the range of data assigned to the learning process includes both negative and positive numbers, i.e., it can occur in PMV index modelling. It is worth noting that on the basis of the results from Figure 15, similarly to Figure 14, it can be stated that the data for a particular learning stage have been assigned regularly (Section 2.5.1). From the results obtained in Figure 15, it can be stated that the obtained Relative Error for most of the data was around −0.001 and that it fluctuated within this number at the value of ±0.002. Figure 16 presents a chart illustrating for which of the data samples assigned to NN the Relative Error (Equation (19)) occurred. This figure shows that the occurrence nature of RE was stochastic. Particular attention in this figure should be given to two samples for which RE was obtained in the range 0.1 < RE < 0.15. This suggests that the neural network for the full range of data assigned to the learning process models the PMV index with MaxARE [%] < 15%. It proves that real measurement data were used to model the PMV index, for which situations in some cases the model needs more than 15 min (sampling time of the measurement data) to adjust. An example of such a situation may be the presence of more people in the building, e.g., large employee meetings or arrival of a school trip. It is worth noting that in each of these two cases after its occurrence, the PMV index model stabilizes with MaxARE [%] < 4%. The occurrence of two samples for which MaxARE [%] > 4% is marginal and represents 0.0188% of the data assigned to the learning process. Therefore, from the point of view of the described NN model for the PMV index, the impact of samples generating errors MaxARE [%] > 4% can be neglected, and the model itself can be treated as one that is characterized by MaxARE [%] < 4% (precisely: MaxARE [%] < 3.73%).
In addition to the analysis of Errors and Relative Errors when assessing the possibility of using a neural network, one ought to perform R (Pearson's correlation coefficient) regression calculations [74], based on which the quality of model fit to the data assigned to the network learning process is assessed. An indicator of this quality is the coefficient of determination R 2 [74]. Figure 17 presents the results of R regression calculations obtained for the discussed network. It shows four charts: Training, Validation, Testing, All data. The first three mentioned relate to the results of calculations received for data assigned to a specific stage of network learning, while the fourth presents the results obtained for all data assigned to the learning process. The described charts were plotted in relation of y NNi (output) to y i (target). In this figure, it can be seen that R for all stages of the learning process was almost equal to 1. It can also be noticed that at the training stage there were two samples to which the network did not adjust with the same accuracy as for the other samples. These two samples were highlighted in the description of Figure 16.  Interpreting the results of R regression calculations [74], it can be stated that the output values of y NNi are very strongly correlated with targets y i . In the case of the validation and testing stages, the correlation between these values is almost perfect [74].
Based on the results of R regression (Figure 17), the values of the coefficient of determination R 2 were calculated. These values are shown in Table 5. A conclusion can be drawn from them that the quality of the PMV index model fit to the data assigned to the learning process, as well as to all its stages, is at a very good level. It is noteworthy that in a situation where there is a perfect match quality (R 2 = 1), in the worst case shown in the table below, R 2 values smaller than 0.00002 are missing. This is proof of well-conducted experiments, proper selection of network architecture and correct selection of the learning method. In order to present the results of the described mapping quality, Figure 18 shows an example of y NNi (outputs) network response on the background of y i (targets) drawn for the arguments x 1 and x 2 (Table 1). In this figure, it can be seen that y NNi (outputs) response values largely overlap with y i (targets). It can also be noticed that the differences between y NNi and y i network responses are marginal, which confirms the results described in this section.  Table 1).
The results described in this section relate to the identified PMV index mathematical model, described by Equation (20). This equation is a special form of Equation (2). The difference between them is that in Equation (20): • the dimensions of the matrix are specified; therefore, this information was noted in their subscripts; • the elements of the matrix are identified numerical values.

Discussion
The research and procedures described in the article show the full process of modelling and testing of PMV index using feedforward neural networks with one hidden layer. The paper describes when neural networks with one hidden layer can be used to model PMV index, and when deep learning approach should be used.
The article presents the procedure for identifying the best structure and the best neural network, which is the mathematical model of the PMV index in terms of the selected evaluation criterion (Section 2.4.3). It also indicates the situations determining the choice of this criterion.
Afterwards, an example of a study on identifying the best structure and best network was presented along with its detailed description. This example was shown in the most general form possible so that it could take the form of a tutorial. For the example mentioned, data from the previously examined real object (Section 2.5.1) were used (see case studies in [23,82]).
The procedure for identifying the best possible network structure is divided into three parts (Section 3.1, Section 3.2, Section 3.3). The first two parts: "robustness study . . . " (Section 3.1) and "overfitting and underfitting study . . . " (Section 3.2) were designed to check which structures are suitable for use. On the other hand, the third part (Section 3.3) describes the selection of the best structure and network in terms of the best compatibility of matching the PMV index model to its real equivalent (Section 2.4.3, Equation (5)). At the end of the modelling process, the best identified neural network was presented along with its results.
The PMV modelling procedure presented in this article enables filling the gap identified by the scientific community in comfort studies related to energy consumption in buildings . . . [1][2][3]5,7,[10][11][12][13]22,[27][28][29][30][31][32]34,35,[45][46][47]. The neural networks used in it introduce flexibility to the formulation of models describing comfort. This flexibility results from the possibility of introducing input arguments of a mathematical model describing a specific comfort, adapted to specific data obtained in a given region of the world for a given group of people. Thanks to this approach, mathematical models using neural networks enable the improvement of individual comfort, resulting in energy savings in the construction sector. However, it should be noted that the savings in energy consumption for the solution described in the article are of primary importance for the buildings that already exist. This is due to the necessity of obtaining data for the networks which are assigned to the learning process. These data are the most reliable in the case of real objects, hence the proposed solution is directed precisely to such objects. It should be noted that it is also possible to model a PMV index with neural networks using metadata. In this case, PMV models using NN should have better consistency of results due to simplifying assumptions used in the models from which the metadata are derived. However, they will be less reliable than the models trained on data from a real object.
The procedure for identifying the best PMV index model described in the article for the analyzed example covered the range of the number of neurons in the hidden layer 1 ≤ s {1} ≤ 50. This range was chosen to show that above a certain number of neurons in the layer, the network is not suitable for use. In the analyzed example, it took place for s {1} ≥ 37.
The procedure is characterized by almost perfect quality of model fit (Setion 3.3.1). This results from the interpretation of the value of determination coefficient R 2 = 0.99998 [74], as well as from the value obtained for all data assigned to the learning process MSE = 0.0000017 (Equation (8)). It is also worth mentioning that the identified model can be treated as one characterized by MaxARE [%] < 4% (precisely MaxARE [%] < 3.73%).
These results for the identified network confirm the correct choice of learning parameters, performance function and transfer functions implemented in the structure. The values of these parameters and types of functions were selected based on the indications from [42,73]. When choosing these parameters and functions in case of PMV index, one should be aware that it depends on the specifics of the data for NN and it is impossible to clearly indicate them before examining them. This is due to the diverse functionality of buildings and external conditions. The described assessment of network applicability (Sections 2.6, 3.1 and 3.2) showed that it is necessary for modelling the PMV index with the use of NN. The research carried out in terms of robustness (Section 3.1) showed that the vast majority of the analyzed network structures are sensitive to changes in the initial weights and bias values assigned to the training stage. A similar effect was observed for the studies carried out in terms of overfitting and underfitting (Section 3.2). As a consequence of the aforementioned research, it turned out that among the 50 analyzed structures (500 networks), only 20% of them were suitable for application (Section 3.3). It means that without such an assessment of the analyzed data there was as much as an 80% chance of choosing the wrong structure.
The results obtained for the best identified NN structure in comparison with those obtained in other studies [22,40,41] prove that this identification is needed. For example, in [22], satisfactory accuracy was obtained for 92.9% of data for NN modelling PMV index, while in the current study, this accuracy is above 99.9% (precisely 99.9812%). Moreover, the best regression results presented in [40] were R = 0.976, so the goodness of fit of the model was characterized at the value level of R 2 = 0.952. Comparing the R 2 = 0.99998 obtained from the best structure identification method (Sections 2.4.3 and 2.6), it means that, thanks to the method proposed in the article, we are able to describe individual thermal comfort more precisely than before. However, this statement is only true for NNs with one hidden layer. The same statement can be drawn for [41], where the differences in the R 2 value between the presented model and the NN obtained in this article were much greater. Therefore, the use of identification compared to similar studies [22,40,41] gave better results in each case.
The above comparative results are relevant for countries' energy policies, energy saving strategies and sustainability in real estate. The fact is that the accuracy of the simulation of the building's thermal performance has a significant impact on energy costs, energy consumption and greenhouse gas emissions [89]. After all, it was found that "to overcome these issues, an appropriate thermal comfort model is needed to determine and measure the accurate and precise value of thermal performance" [89]. As the results obtained in the article show, such an appropriate thermal comfort model can be achieved thanks to the proposed method (Sections 2.4.3 and 2.6). Nowadays, thanks to the use of the appropriate thermal comfort model to predict a building's energy consumption, the time that is essential for cooling or heating can be decreased by almost 20% [89]. In view of the above, increasing the accuracy of the model to that presented in the article will improve this result even more. Other applications of the PMV model formulated thanks to NN that affect energy consumption are shown in [22,28,38,90,91].

Conclusions and Future Research Program
The main results of the research can be summarized as follows: 1.
The method presented in the article enables filling the gap identified by the scientific community in comfort studies related to energy consumption in buildings.

2.
There are two approaches to filling the identified gap in the case of NNs with one hidden layer (Section 2.4.3): the first for the best quality fit of the model (Equation (5)), and the second one takes into account the quality of the fit with the minimum complexity of the NN model (Equation (7)).

3.
When designing the PMV index using NNs with one hidden layer, it is necessary to perform a robustness study (Section 3.1.) along with an overfitting and underfitting study (Section 3.2). Otherwise, there is a high likelihood (in the analyzed case about 80%) that NN will not be usable.

4.
NNs with one hidden layer enable PMV index modelling with almost perfect quality of model fit as long as the best structure identification method is used (Sections 2.4.3 and 2.6).

5.
The use of the identification method (Sections 2.4.3 and 2.6) compared to similar studies with NNs with one hidden layer gave better results in each case. 6.
The method presented in the article (Sections 2.4.3 and 2.6) makes it possible to formulate the equation (Equations (20) and (A1)) characterizing individual thermal comfort for the object under study in terms of its basic functionality.
As for the future research program: The research presented in the article is to constitute a thematically coherent series of papers presenting how to characterize indoor environment in smart buildings. In the first part of the cycle, the PMV index modelling method was presented along with a description for identifying the best neural network with one hidden layer. The objective of this article from the point of view of the complete cycle is to show that in the case of use of the building for its intended purpose, using NN with one hidden layer, one can get almost perfect quality of the model fit for the PMV index.
Therefore, for such use of the building, it is sufficient to apply classic NNs with one hidden layer. The advantage of this approach is that the model obtained from the presented method (Sections 2.4.3 and 2.6) does not take into account random cases of using the building, such as repairs, renovation of rooms, etc. The model is characterized by individual thermal comfort for the object under study in terms of its basic functionality. Additionally, thanks to the identified equation (Appendix D) derived from the method of identifying the best network structure, it is possible to interpret the impact of network input arguments on the PMV index describing individual thermal comfort. This interpretation is based on the assessment of the weight and bias values of this equation (Appendix D). This advantage derives from the fact that the models formulated using NN with one hidden layer are much simpler than those designed using DNN.
On the other hand, the disadvantage of this method is the need to properly select data for the network.
In the next parts of the publication cycle, the second approach using DNN for the described topic will be presented, as well as a comparative analysis of the use of NNs and DNNs.
Funding: The funding for research and publication was received from the Cracow University of Technology (Kraków, Poland).

Acknowledgments:
The author would like to express his gratitude to A.M. Stręk from Cracow University of Technology for the given advice, and to Jared Langevin from Drexel University who made available the data for NN.

Conflicts of Interest:
The author declares no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Figure A1. The histogram of all input data values assigned to the learning process with the division into learning stages.

Appendix B Results of the Selection of Network Learning Parameters
Figures A3 and A4 were created for the network with s {1} = 5, that is, for the structure of the network identified as the best. Figure A3 shows the results for the implemented performance function (MSE) calculated for all data assigned to the NN learning process in the learning rate function ( Table 3). The best result: MSE = 0.0000017 was obtained for the learning rate = 0.01.

Appendix B. Results of the Selection of Network Learning Parameters
Figures A3 and A4 were created for the network with s {1} = 5, that is, for the structure of the network identified as the best. Figure A3 shows the results for the implemented performance function ( ) calculated for all data assigned to the NN learning process in the learning rate function ( Table 3). The best result: = 0.0000017 was obtained for the learning rate = 0.01. Figure A3. The learning rate selection results for the NN learning process obtained for the range from 0.005 to 0.5. The best learning rate was obtained for the value 0.01. Figure A4 shows the results for the implemented performance function ( ) computed for all data assigned to the NN learning process as a function of momentum ( Table 3). The best result: = 0.0000017 was obtained for the momentum of 0.9.  Figure A3. The learning rate selection results for the NN learning process obtained for the range from 0.005 to 0.5. The best learning rate was obtained for the value 0.01. Figure A4 shows the results for the implemented performance function (MSE) computed for all data assigned to the NN learning process as a function of momentum ( Table 3). The best result: MSE = 0.0000017 was obtained for the momentum of 0.9. Figure A3. The learning rate selection results for the NN learning process obtained for the range from 0.005 to 0.5. The best learning rate was obtained for the value 0.01. Figure A4 shows the results for the implemented performance function ( ) computed for all data assigned to the NN learning process as a function of momentum ( Table 3). The best result: = 0.0000017 was obtained for the momentum of 0.9. Figure A4. The results of the momentum selection for the NN learning process obtained for the range from 0.1 to 3. The best momentum was obtained for the value of 0.9.