Anticipating the Length of Employees’ Working Time

Asymmetry of employee productivity is an important issue when planning production work. Estimated time to complete the work depends on many factors, such as employee experience, qualifications, and efficiency of brigades and subcontractors, accidents, unreliable suppliers, material shortages, the need for correction work, fatigue, stress, etc. The paper presents the statistical method—Multivariate Method of Statistical Models—MMSM. It enables the assessment of the impact on the duration of works of its characteristic variables and the prediction of the duration of individual orders. In order to carry out the multi-criteria method (MMSM) analysis, the employees’ working time was measured at the newly opened steel structure production plant in Kielce. The results of the analyses clearly show that thanks to the method used, quick and accurate prediction of employees’ work efficiency is possible. In the analyzed case, the best forecast was obtained using the method of automatic neural networks, where the MAPE error = 0.02%.


Introduction
Planning production works is an extremely important issue, which depends on the material delivery schedule, transport schedule, employment schedule, proper use of available work stations and machines, and many others [1]. Production ventures are exposed to various risk factors that can disturb the pre-planned time for the execution of the element. The most common risk factors are [2,3]: different employee experience, qualifications and performance of brigades and subcontractors, accidents, unreliable suppliers, material shortages, the need for correction work, fatigue, stress, etc. Beck and Shen even worked on the impact of presidential elections on employee productivity [4]. The variability of the environment in which the project is carried out is a source of uncertainty and risk. This risk relates mainly to the time, cost, and quality.
Production plants set average working time standards for the worker or the number of working hours to complete the item. However, these are averaged times, often not mapping in reality. Production schedules often turn out to be divergent with production practice and are the reason for delays in deliveries to contractors [5] which is associated with contractual penalties for failure to meet deadlines.
Today's computing power creates opportunities for performing complex computational operations in a short time, and thus for creating completely new computational methods that are useful in the issue of scheduling. Many researchers deal with the subject of work efficiency and methods of its estimation or planning [6]. Research was carried out to determine the impact on work efficiency, including: fatigue and breaks [7,8], age of employees [9], accidents at work [3], motivating factors for work [10], and even the role of religion or sexuality of the employee [11].
The work presents a methodology for improving the planning of production works and determining their time, taking into account various factors affecting the implementation of production.

Methods
The Multivariate Method of Statistical Models (MMSM) is a new methodology for forecasting project implementation time. It allows you to create schedules taking into account real conditions, not averaged. The method is based on the computational analysis of prognostic methods: multiple regression, multi-naming adaptive regression using spline functions, generalized additive models, artificial neural networks, support vectors. and integrated autoregression. Conducting calculations involves conducting prognostic analyzes, from which the best fit to real data is then selected based on the least prognostic error. The method was developed by M. Rogalska [12] and is successfully used to predict the times of construction works.
At the initial stage of the analysis, it is necessary to determine the type of distribution of individual variables. Normal distribution Gaussian distribution is the most common type of empirical distribution. It results from the fact that many phenomena, especially natural ones, are shaped according to such a distribution [13]. To determine the normality of the distribution, the Shapiro-Wilk test is performed. It is the preferred test of normality because of its high power compared to other tests. Using the Shapiro-Wilk test, it is possible to check if the variables have distributions close to the normal distribution [14]. The Shapiro-Wilk test statistic value for 32 cases should be a minimum of 0.930 for the distribution to be classified as normal. The next step is to check the correlation between variables, so you can eliminate strongly related variables that can affect the disturbance or distortion of the analysis.
Based on the collected data (Table 1), analyses were performed using such computational methods as multiple regression forecasting (MR (Multiple Regresion), forecasting using multimodal adaptive regression using glued functions-MARSplines (Multivariate Adaptive Regression Splines), forecasting using generalized models additive-GAM (Generalized Additive Models), as well as the method of Automatic Neural Networks-NN (Neural Networks).
Forecasting by means of multiple regression method, it aims to quantify the relationships between many independent (explanatory) and dependent (criterion, explained) variables [15,16]. The purpose of building and analyzing the multiple redression model is to examine the existence of relationships between variables and the effect of Xi variables on the Y variable (in this case, "employee performance"). The condition for the correctness of the model is: no strong correlation between independent variables, a greater number of observations than the number of parameters to be estimated, independent variables cannot be a linear combination of other independent variables.
Forecasting with the use of multi-naming adaptive regression methods using spline functions is known as MARSplines. The MARS method is one of the many tools of statistical data mining [17], used, among others, to solve regression problems. As a non-parametric model, it does not require assumptions about the shape of the functional relationship between dependent and independent variables. This relationship is determined as a linear combination of base functions based solely on the information contained in the input data. Data is divided into areas in which separate regression functions are independently defined. This tool is especially valuable with more dimensions and large data sets, with complex relationships between them.
Forecasting by the method of generalized additive models is known as GAM. Generalized additive models, in short, GAM, were developed in 1990 by Hastie and Tibshirani [18][19][20] who proposed estimation for multidimensional variables by means of additive approximation of the regression function, replacing the linear function of explanatory variables with additive "nonparametric" functions, which can be estimated, for example, by smoothed cubic spline functions. In the case of generalized additive models, instead of a combination of linear predictors, we used a nonparametric function obtained by applying smoothing to the scatter chart of partial residuals (for transformed values of the dependent variable).
Forecasting with the NN method allows obtaining very good forecast results [12]. It involves the processing of data by neurons grouped into layers. Appropriate results are obtained thanks to the learning process, which consists of modifying the weights of those neurons that are responsible for the error. Neural networks are resistant to damage, to incorrect or incomplete information [12]. They can operate efficiently even when some of its elements are damaged and some of the data is lost. They are best suited for solving a class of tasks where writing a normal program is very difficult or impossible, e.g., due to the lack of a known algorithm.
The regression line expresses the best prediction of the dependent variable for given independent variables. However, nature is rarely perfectly predictable and we usually deal with deviations of the measurement points from the regression line. The deviation of a given point on the graph from the regression line (i.e., its predicted value) is called the "residual" value. During the individual analyses, their correctness was checked and assessed by means of the autocorrelation check and partial autocorrelation of residues, as well as fit charts. All analyses, in the last phase of work, were subjected to the MAPE error test. The MAPE (Mean Absolute Percentage Error) error [12,21] is the average absolute percentage error that is made when forecasting based on a specific model. This parameter is known by analyzing the accuracy of forecasts constituting the basis for building the estimation model. It informs about the average size of measured errors. It follows that the smaller the MAPE, the better. The error is expressed by the formula: where: T-number of the last moment/period for which the forecast was checked; N-number of the last known observation of the forecast variable; Y i -actual value of the variable in period i; Y ip -projected value of the variable in period i.

Results
The variables were subjected to the normality test. The results ( Table 2) show that most of the data has a normal distribution. In contrast, temperature and well-being are variables with a distribution close to normal.

0.839
Correlations between variables were examined. The results indicate a relationship between variables (Table 3). Correlations mean that the independent variable has an impact on the independent variable. In practice, this means that employee performance largely depends on the number of components made, employee experience, and the time of grinding. The most correlated independent variables are "number of items made" and "employee experience". Some methods, such as multiple reflection, require variables not to be strongly correlated. Therefore, one of these variables will need to be excluded from the analysis, because it can generate a larger error or completely prevent calculations.

0.839
Correlations between variables were examined. The results indicate a relationship between variables (Table 3). Correlations mean that the independent variable has an impact on the independent variable. In practice, this means that employee performance largely depends on the number of components made, employee experience, and the time of grinding. The most correlated independent variables are "number of items made" and "employee experience". Some methods, such as multiple reflection, require variables not to be strongly correlated. Therefore, one of these variables will need to be excluded from the analysis, because it can generate a larger error or completely prevent calculations.

0.839
Correlations between variables were examined. The results indicate a relationship between variables (Table 3). Correlations mean that the independent variable has an impact on the independent variable. In practice, this means that employee performance largely depends on the number of components made, employee experience, and the time of grinding. The most correlated independent variables are "number of items made" and "employee experience". Some methods, such as multiple reflection, require variables not to be strongly correlated. Therefore, one of these variables will need to be excluded from the analysis, because it can generate a larger error or completely prevent calculations. Table 3. Summary of the correlation study.

0.839
Correlations between variables were examined. The results indicate a relationship between variables (Table 3). Correlations mean that the independent variable has an impact on the independent variable. In practice, this means that employee performance largely depends on the number of components made, employee experience, and the time of grinding. The most correlated independent variables are "number of items made" and "employee experience". Some methods, such as multiple Symmetry 2020, 12, 413 7 of 22 reflection, require variables not to be strongly correlated. Therefore, one of these variables will need to be excluded from the analysis, because it can generate a larger error or completely prevent calculations.

Multiple Regresion Method
In the multiple regression method, the independent variable "number of elements made" was excluded from the calculations, because it was very strongly correlated with the dependent variable "employee performance" and the variable "employee experience" (Figure 1).

Mea
Std.D

Multiple Regresion Method
In the multiple regression method, the independent variable "number of elements made" was excluded from the calculations, because it was very strongly correlated with the dependent variable "employee performance" and the variable "employee experience" (Figure 1).  The use of two independent variables that are predictors and strongly linearly correlated with each other may have a negative impact on the analysis. In such a situation, independent variables will not introduce a significant dependence to the regression pattern, only a distortion.  Table 4 is a summary of the MR1 analysis. Independent variables were used for the analysis: v2, v5, v6, v7 (Figures 2-4). Based on the generated calculations, it was possible to determine a regression formula that would allow prediction.       Based on the calculations and analysis of their results, the correctness of the adopted model is confirmed.

MR2
To obtain a better prediction, the independent variable "number of elements made" was modified so that its distribution was normal. The variable v3 was logarithmized and then used in the analysis (Table 5). Despite the good mapping on the graph of predicted and observed values, it turned out that in comparison the values of autocorrelation of residues and autocorrelation of partial residues exceed the limit. Thus, the model is not correct (Figures 5-7) and its analysis and calculation of the MAPE error was not continued.  Based on the calculations and analysis of their results, the correctness of the adopted model is confirmed.

MR2
To obtain a better prediction, the independent variable "number of elements made" was modified so that its distribution was normal. The variable v3 was logarithmized and then used in the analysis (Table 5). Despite the good mapping on the graph of predicted and observed values, it turned out that in comparison the values of autocorrelation of residues and autocorrelation of partial residues exceed the limit. Thus, the model is not correct (Figures 5-7) and its analysis and calculation of the MAPE error was not continued.     Figure 6. Autocorrelation function-rest. Figure 6. Autocorrelation function-rest.

Multivariate Adaptive Regression Splines Method
The MARSplines method itself rejects correlated variables during analysis. Thus, in this case, all available independent variables were used for the calculation (Tables 6-8). The forecast and observed values chart does not coincide very well, but is satisfactory. No discrepancies were detected in the values of the residue autocorrelation and autocorrelation functions particle residue, so the model as well as the regression pattern is considered to be correct (Figures 8-10). The last step of the analysis was to determine the MAPE error, which also proved to be very good.

Multivariate Adaptive Regression Splines Method
The MARSplines method itself rejects correlated variables during analysis. Thus, in this case, all available independent variables were used for the calculation (Tables 6-8). The forecast and observed values chart does not coincide very well, but is satisfactory. No discrepancies were detected in the values of the residue autocorrelation and autocorrelation functions particle residue, so the model as well as the regression pattern is considered to be correct (Figures 8-10). The last step of the analysis was to determine the MAPE error, which also proved to be very good. MARS1 (v1; v2, v4, v5, v6, v7). Dependent variable: v1; Independent variable: v2, v4, v5, v6, v7.       Based on the calculations and analysis of their results, the correctness of the adopted model is confirmed.

GAM1
The GAM analysis began with the selection of the appropriate variable distribution and binding function. Due to the fact that the dependent variable and most independent variables had a normal distribution, Normal distribution and Identity binding function were selected for analysis. Independent variables with normal distribution were adopted for calculations: v2, v3, v4, v6 (Table  9). Despite the good mapping on the graph of predicted and observed values, it turned out that in comparison the values of autocorrelation of residues and autocorrelation of partial residues exceed the limit (Figures 11-13). Thus, the model is not correct and its analysis and calculation of the MAPE error was not continued.   Based on the calculations and analysis of their results, the correctness of the adopted model is confirmed.

GAM1
The GAM analysis began with the selection of the appropriate variable distribution and binding function. Due to the fact that the dependent variable and most independent variables had a normal distribution, Normal distribution and Identity binding function were selected for analysis. Independent variables with normal distribution were adopted for calculations: v2, v3, v4, v6 (Table  9). Despite the good mapping on the graph of predicted and observed values, it turned out that in comparison the values of autocorrelation of residues and autocorrelation of partial residues exceed the limit (Figures 11-13). Thus, the model is not correct and its analysis and calculation of the MAPE error was not continued. Based on the calculations and analysis of their results, the correctness of the adopted model is confirmed.

GAM1
The GAM analysis began with the selection of the appropriate variable distribution and binding function. Due to the fact that the dependent variable and most independent variables had a normal distribution, Normal distribution and Identity binding function were selected for analysis. Independent variables with normal distribution were adopted for calculations: v2, v3, v4, v6 (Table 9). Despite the good mapping on the graph of predicted and observed values, it turned out that in comparison the values of autocorrelation of residues and autocorrelation of partial residues exceed the limit (Figures 11-13). Thus, the model is not correct and its analysis and calculation of the MAPE error was not continued. GAM1 (v1; v2, v3, v4, v6). Dependent variable: v1; Independent variable: v2, v3, v4, v6.         Figure 13. Partial autocorrelation function-rest.

GAM2
Due to the fact that not all variables had a normal distribution, it was decided to carry out another GAM analysis, this time assuming the distribution of the Gamma variable and Log binding function. All independent variables were assumed for calculations of the v1 variable. However, the program rejected some independent variables due to too few different data values in these cases (Table 10). Despite the very good mapping on the graph of predicted and observed values, it turned out that in comparison the values of autocorrelation of residues and autocorrelation of partial residues exceed the limit (Figures 14-16). Thus, the model is not correct and its analysis and calculation of the MAPE error was not continued. GAM2 (v1; v2, v3, v4, v5, v6, v7). Dependent variable: v1; Independent variable: v2, v3, v4, v5, v6, v7.

GAM2
Due to the fact that not all variables had a normal distribution, it was decided to carry out another GAM analysis, this time assuming the distribution of the Gamma variable and Log binding function. All independent variables were assumed for calculations of the v1 variable. However, the program rejected some independent variables due to too few different data values in these cases (Table 10). Despite the very good mapping on the graph of predicted and observed values, it turned out that in comparison the values of autocorrelation of residues and autocorrelation of partial residues exceed the limit (Figures 14-16). Thus, the model is not correct and its analysis and calculation of the MAPE error was not continued. GAM2 (v1; v2, v3, v4, v5, v6, v7). Dependent variable: v1; Independent variable: v2, v3, v4, v5, v6, v7. Regression formula: v1 = e^(−0.193360 -0.000019v2 + 0.059500v3 -0.000024v4 -0.000096v6).     Figure 16. Partial autocorrelation function-rest.

Neural Networks Method
The following tables (Tables 11 and 12) are a summary of the NN analysis carried out. Independent variables were used for the analysis: v2, v3, v4, v5, v6, v7. Based on the calculations made it was possible to assess the correctness of the prediction.

BFGS 17 SOS Tanh Identity
Regression formula: It is not generated.
Based on the calculations and analysis of their results (Figures 17-19), the correctness of the adopted model is confirmed.

Neural Networks Method
The following tables (Tables 11 and 12) are a summary of the NN analysis carried out. Independent variables were used for the analysis: v2, v3, v4, v5, v6, v7. Based on the calculations made it was possible to assess the correctness of the prediction.  Based on the calculations and analysis of their results (Figures 17-19), the correctness of the adopted model is confirmed.     Figure 18. Autocorrelation function-rest. Figure 18. Autocorrelation function-rest.  Figure 19. Partial autocorrelation function-rest.

Discussion
Five main directions of scheduling development can be distinguished: the first including the introduction of fuzzy data [22,23], the second taking into account time buffers [24,25], the third introducing risk factors [26], the fourth regarding task scheduling using artificial intelligence tools [27,28], and the fifth involving pipeline task scheduling [29]. A common feature of all faculties is the use of work input values as an element necessary to create a schedule. In construction, the amount of work [30] is given in the form of the total work time of all workers performing a specific task. Establishing the input standard is difficult because the input for the same production unit is not always the same. They depend on many factors, such as: method of construction and type of construction, type and grade of materials used, qualifications and abilities of workers, working conditions, technique and technology of execution, quality requirements, number of works to be performed, and size of the working team. The expenditure standards obtained in this way are averaged values and may significantly differ from the time the work was carried out under certain conditions. The use of average time for employees in the schedule is different from the actual values for individual employees and may be the reason for non-compliance. The methodology presented in the article is more accurate and allows better estimation of the time needed to complete a given number of elements by an employee.
After conducting a number of analyzes using various calculation methods, it is possible to assess their usefulness in the employee performance forecast. Analyzing the collected data manually, it could be seen that there is a relationship between the independent variables "employee experience" and "number of elements made" and the dependent variable "employee performance". In practice, it is often found that employee performance depends on their seniority in similar positions. Correlation analysis only confirmed the authors' suppositions. It was necessary to exclude one of the correlated independent variables in order to improve the prediction model of some methods (as in the MR1 method).
The results of individual analyzes are summarized in the table below (Table 13).

Discussion
Five main directions of scheduling development can be distinguished: the first including the introduction of fuzzy data [22,23], the second taking into account time buffers [24,25], the third introducing risk factors [26], the fourth regarding task scheduling using artificial intelligence tools [27,28], and the fifth involving pipeline task scheduling [29]. A common feature of all faculties is the use of work input values as an element necessary to create a schedule. In construction, the amount of work [30] is given in the form of the total work time of all workers performing a specific task. Establishing the input standard is difficult because the input for the same production unit is not always the same. They depend on many factors, such as: method of construction and type of construction, type and grade of materials used, qualifications and abilities of workers, working conditions, technique and technology of execution, quality requirements, number of works to be performed, and size of the working team. The expenditure standards obtained in this way are averaged values and may significantly differ from the time the work was carried out under certain conditions. The use of average time for employees in the schedule is different from the actual values for individual employees and may be the reason for non-compliance. The methodology presented in the article is more accurate and allows better estimation of the time needed to complete a given number of elements by an employee.
After conducting a number of analyzes using various calculation methods, it is possible to assess their usefulness in the employee performance forecast. Analyzing the collected data manually, it could be seen that there is a relationship between the independent variables "employee experience" and "number of elements made" and the dependent variable "employee performance". In practice, it is often found that employee performance depends on their seniority in similar positions. Correlation analysis only confirmed the authors' suppositions. It was necessary to exclude one of the correlated independent variables in order to improve the prediction model of some methods (as in the MR1 method).
The results of individual analyzes are summarized in the table below (Table 13). The correctness of calculations was determined based on the function of residue correlation and partial autocorrelation of residues. The residual series, which arises as the difference between the measured values and the predicted values, should have an average close to zero. At the same time, its standard deviation need not be zero. If the model is correct, no random correlation should occur. However, computational practice shows that if the limit values are not exceeded in the first 8 delays, the model can be assumed to be correct.
All examined residual autocorrelation and partial residual autocorrelation plots were summarized for 15 delays and their limits. Table 13 shows that several methods exceeded limit values in the first 8 delays. Therefore, it should be stated that the hypothesis about the presence of residues in these cases was confirmed and the equation from which the residual series can not be a regression equation. Based on residue autocorrelation and partial residue autocorrelation, MR2, GAM1, and GAM2 models were rejected. Other models meet the criterion.
Of the many available measures for assessing the accuracy of prediction and calculation of errors, it was decided to use the MAPE error, because it is the most reliable. This is the basic measure of the correctness of calculations. In the assessment of the forecast, the criterion was adopted in which the MAPE error below 1% is considered to be highly accurate, from 1-3% as good, and above 3% as not good.
The MAPE error calculations were made only for the correct models and all forecasts can be considered highly accurate because their error value is less than 1%. The best prediction was obtained by the method of automatic neural networks and amounts to 0.02%.

Conclusions
The presented work shows various computational models of the MMSM method, thanks to which it is possible to predict employees' work performance quickly and accurately. At present, anticipating employee performance and the length of time they work, or the time it takes to perform a specific task, is time consuming and often complicated. The methodology presented in the article is possible to apply in various sectors of the economy. Any company with input data that wants to gain better control and predict the plant's performance can use the presented methodology. Based on the research conducted, the following conclusions were made: (1) Forecasting and prediction is possible based on the regression formula of employee performance by the MMSM method. The method shows how certain characteristic variables for this process have an impact on the duration of works; (2) Half of the analyzed models is correct and reflects employee performance very accurately. It is possible to use selected models to assess employee performance and the duration of work for a specific order; (3) The best forecast was obtained using the automatic neural network method, MAPE = 0.02%.
The disadvantage of this method is the lack of a regression pattern, which limits its universality. New calculations can be made by a person who has all previous data and a neural network.