Prediction of Municipal Waste Generation in Poland Using Neural Network Modeling

Planning is a crucial component of short- and long-term municipal waste management. Establishing the relationships between the factors that determine the amount of waste generated by municipalities and forecasting the waste management needs plays a fundamental role in the development of effective planning strategies and implementation of sustainable development. Artificial Neural Network employed for verifying the forecasts pertaining to the amount of rainfall in Poland were presented in the studies. The proposed models included selected explanatory indices in order to reflect the impact of social, demographic and economic factors on the amount of generated waste. Mean squared error (MSE) and regression value (R) are used as indices of efficiency of the developed models. The ANN models exhibited high accuracy of forecasts at high R values (R = 0.914, R = 0.989) and low MSE values. Derived from the socioeconomic data for 2003–2019, the model predicts that the future waste generation in 2024 will increase by 2%. The results indicate that the employed ANN models are effective in predicting the amount of waste and can be considered a cost-effective approach to planning integrated waste management systems.


Introduction
Environmental management is one of the most efficient instruments for achieving sustainable development. There are numerous methods for improving the condition of the environment that involve the implementation of integrated environmental management techniques [1]. The main task of environmental management is the implementation of solutions that will monitor and investigate the social, economic and industrial effects on the environment in the long-term [2]. Creating an adequate model for management and prediction of generated waste could become an important measure to be undertaken.
In environmental policy, waste management is considered as one of the key areas. The Waste Management Act (2001) defines municipal waste as what is produced in households or from other sources, containing no harmful substances that resembles household waste in terms of properties or composition [3]. In 2019, the total amount of waste generated in the EU by all sectors of the economy and households amounted to 2538 million tons. As it can be presumed, the total amount of generated waste is-to an extent-connected with the number of people and the size of the economy within particular countries. The smallest EU member states generally recorded the lowest levels of generated waste, whereas larger ones recorded the highest. However, relatively high amounts of waste were generated in Bulgaria and Romania, whereas a relatively low amount was generated in Italy [4].
In Poland, the reported annual household waste generation in cities ranges from 238 to 315 kg per resident. Biodegradable waste is the most dominant fraction, followed by paper/cardboard and 1.
Is there a possibility of applying artificial neural networks for the successful prediction of MSW (categorized waste) levels based on the social and economic factors? 2.
Is there a possibility of applying artificial neural networks for the successful prediction of future trends in waste generation based on historical data?

Datasets
The data implemented into the ANN models provided the waste generation characteristics of several municipalities with diverse socioeconomic conditions. The social and economic factors as well as cities selected for analysis are presented in Table 1. The statistical data pertaining to MSW are divided into five categories of generated waste, as presented in Table 2. The input data were selected in line with the national waste management plan [3].  Moreover, the modeling of waste generation divided into total waste (in Mg) and household waste (in Mg) was performed on historical data from 2003-2009 presented in Table 3.

Statistical Analyzis
The explanatory variables affecting the quantity and type of waste were selected based on the correlation test and literature review.
The data were statistically analyzed using Pearson's correlation analysis (r) to establish the significance between the input data (the social and economic factors for selected cities shown in Table 1) and the output data (MSW divided into five categories for particular cities presented in Table 2). In the interpretation of the results from the correlation studies, the criteria of the dependency force adopted by Bam et al. (2011) were employed. These assume that the correlation coefficient r > 0.7 indicates a strong relationship between the two parameters, while the values in the range of 0.5-0.7 indicate a moderate relationship between them [35].
To verify the statistical significance of correlation coefficients, the critical coefficient of correlation R crit was calculated in line with the following formula: where d f = n − k is the degree of freedom, n is the number of datasets and k is the number of explanatory variables. The t crit value is the cut-off between retaining and rejecting the null hypothesis and must be derived from existing tables [21]. If r < negative critical value or r > positive critical value, then R crit is significant.

Neural Network
The ANN modeling was performed using the Neural Network library in MatLab and Simulink software. The waste generation modeling for Polish cities involved grouping the MSW into five categories: paper and cardboard, glass, plastics and metals, biodegradable and other waste. The categories were defined as the output neurons (5). The Neural Network Fitting was used for training and the Levenberg-Marquardt algorithm for teaching. The generated networks consist of one hidden layer. Waste modeling was performed under the following variables: population, revenue per capita, the employment-to-population ratio, the number of entities enlisted in National Business Registry Number (REGON) per 10,000 population and the number of entities by type of business activity (industry/construction); these constituted the input information (5 neurons). The number of neurons in the hidden layer (2-10) was selected experimentally. A schematic representation of an artificial neural network is shown in Figure 1. The generated network was modeled using 25 datasets (Tables 1 and 2). Sustainability 2020, 12, x FOR PEER REVIEW 6 of 17 categories: paper and cardboard, glass, plastics and metals, biodegradable and other waste. The categories were defined as the output neurons (5). The Neural Network Fitting was used for training and the Levenberg-Marquardt algorithm for teaching. The generated networks consist of one hidden layer. Waste modeling was performed under the following variables: population, revenue per capita, the employment-to-population ratio, the number of entities enlisted in National Business Registry Number (REGON) per 10,000 population and the number of entities by type of business activity (industry/construction); these constituted the input information (5 neurons). The number of neurons in the hidden layer (2-10) was selected experimentally. A schematic representation of an artificial neural network is shown in Figure 1. The generated network was modeled using 25 datasets (Tables 1 and 2). At the country level, the waste generation modeling was divided into total and household waste and selected as output neurons (2). Given its suitability for modeling time series, a Nonlinear Autoregressive (NAR) network model was utilized, and learning was performed using the Levenberg-Marquardt algorithm. The historical data for 2003-2019 was used in the study ( Table 3). The number of neurons in the hidden layer (2-15) was selected experimentally. The number of delays was 2. The network computations were performed on 17 data sets.
The selection of the most suitable network was based on the quality indices of the network, including: the mean square error (MSE) and regression (R) value. MSE was calculated using the following formula: Where -number of cases in a given set; -actual value of water quality index for the i-th observation; * -predicted value of water quality index for the i-th observation.
The regression value R measures the correlation between outputs and inputs. It shows how well the predicted outputs match the real outputs: the trained network is good if R is close to 1. The regression R values were calculated according to the formula: where -standard deviation of reference values, * -standard deviation of predicted values. The higher the regression coefficient R and the lower MSE, the better the quality of the generated network.

Investigation of the Correlation Coefficients between Explanatory Variables and Dependent Variable
Four variables were taken into account in the presented studies: population, revenue per capita, the employment-to-population ratio, the number of entities in REGON per 10,000 people and the number of entities by type of business activity. The input and output variables were subjected to At the country level, the waste generation modeling was divided into total and household waste and selected as output neurons (2). Given its suitability for modeling time series, a Nonlinear Autoregressive (NAR) network model was utilized, and learning was performed using the Levenberg-Marquardt algorithm. The historical data for 2003-2019 was used in the study ( Table 3). The number of neurons in the hidden layer (2-15) was selected experimentally. The number of delays was 2. The network computations were performed on 17 data sets.
The selection of the most suitable network was based on the quality indices of the network, including: the mean square error (MSE) and regression (R) value. MSE was calculated using the following formula: where n-number of cases in a given set; y i -actual value of water quality index for the ii-th observation; y * i -predicted value of water quality index for the i-th observation. The regression value R measures the correlation between outputs and inputs. It shows how well the predicted outputs match the real outputs: the trained network is good if R is close to 1. The regression R values were calculated according to the formula: where σ y -standard deviation of reference values, σ y * -standard deviation of predicted values. The higher the regression coefficient R and the lower MSE, the better the quality of the generated network.

Investigation of the Correlation Coefficients between Explanatory Variables and Dependent Variable
Four variables were taken into account in the presented studies: population, revenue per capita, the employment-to-population ratio, the number of entities in REGON per 10,000 people and the number of entities by type of business activity. The input and output variables were subjected to correlation analysis. The values of correlation coefficients are presented in Table 4. These coefficients are a measure of the correlation between variables. The statistical significance of correlation coefficients was verified by calculating the critical coefficient of correlation, which, at 0.05 level of significance, is Sustainability 2020, 12, 10088 7 of 16 equal to 0.396. Since the calculated coefficients of correlation are higher, there is a statistically significant relationship between variables. From Table 3, it can be seen that the coefficients of correlation between the input data-i.e., population, revenue per capita, the number of entities enlisted in REGON per 10,000 population, the number of entities by type of business activity (industry/construction) and the amount of waste from particular categories: paper and cardboard, glass, biodegradable and other waste-are positive. This means that if any of the input parameters increases, the amount of waste will increase as well. The negative coefficient between the input data and the amount of plastic waste indicates an inverse relationship.
According to the coefficients of correlation presented in Table 3, the population and the number of entities by type of business activity (industry/construction) are correlated to the greatest extent with the amount of waste of each type.

Modeling of Municipal Waste in Cities
The best modeling results were obtained for the network with 6 neurons that was obtained in 10 iterations. Other data, including performance validation and error rate decrease (gradient), are shown in Figure 2. The predictive performance of ANNs was assessed with the use of MSE, which showed that the best validation was obtained for iteration 4 ( Figure 3). During modeling, the error is known to decrease over subsequent training; however, it has been reported to increase in the validation dataset as a result of the network beginning to over-adjust the training data. The training is stopped after six consecutive increases in the validation error (or no decrease in error) and the best results are obtained from the iteration with the lowest validation error.   Generally, during modeling, the error decreases over successive training periods but may start to increase in the validation dataset as the network overfits the training data. Training stops after six consecutive increases in the validation error, and the best results are obtained from the iteration with the lowest validation error. The network error histogram is presented in Figure 4. Due to the fact that the shape of the histogram resembles the Gaussian distribution curve and the fact that the largest number of errors has the lowest values, it can be concluded that the trained network is of good quality and there are no symptoms of overfitting the network. Generally, during modeling, the error decreases over successive training periods but may start to increase in the validation dataset as the network overfits the training data. Training stops after six consecutive increases in the validation error, and the best results are obtained from the iteration with the lowest validation error. The network error histogram is presented in Figure 4. Due to the fact that the shape of the histogram resembles the Gaussian distribution curve and the fact that the largest number of errors has the lowest values, it can be concluded that the trained network is of good quality and there are no symptoms of overfitting the network.  Several observations can be made from the regression statistics detailed in Figure 5. Regression (R) value for training data is equal to 0.912, for validation data-0.92, and for test data-0.954. The regression value in each case is R > 0.9, which indicates a good fit of the network with the data. The overall regression was 0.914.  Several observations can be made from the regression statistics detailed in Figure 5. Regression (R) value for training data is equal to 0.912, for validation data-0.92, and for test data-0.954. The regression value in each case is R > 0.9, which indicates a good fit of the network with the data. The overall regression was 0.914. Several observations can be made from the regression statistics detailed in Figure 5. Regression (R) value for training data is equal to 0.912, for validation data-0.92, and for test data-0.954. The regression value in each case is R > 0.9, which indicates a good fit of the network with the data. The overall regression was 0.914.

Prediction of Municipal Waste in Cities
After the networks had been modeled, a Simulink diagram was generated (Figure 6), which served to forecast the future trends with respect to the given waste categories and other cities that were described by the relevant statistical and demographic data: population, revenue per capita, the Output ~= 0.84*Target + 6.1e+02

Prediction of Municipal Waste in Cities
After the networks had been modeled, a Simulink diagram was generated ( Figure 6), which served to forecast the future trends with respect to the given waste categories and other cities that were described by the relevant statistical and demographic data: population, revenue per capita, the employment-to-population ratio, the number of entities enlisted in REGON per 10,000 population and the number of entities by type of business activity (industry/construction).  The waste generation levels predictions according to the established waste product categories (paper and cardboard, glass, plastics and metals, biodegradable and other waste) for three selected cities (Radom, Kielce and Bydgoszcz) are displayed in Figure 7. For comparison, the values obtained from modeling and statistical data are collated against the historical data.   The waste generation levels predictions according to the established waste product categories (paper and cardboard, glass, plastics and metals, biodegradable and other waste) for three selected cities (Radom, Kielce and Bydgoszcz) are displayed in Figure 7. For comparison, the values obtained from modeling and statistical data are collated against the historical data. The prediction results for waste generation under the MSW categories: paper and cardboard, glass, plastics and metals, biodegradable, for the cities of Radom, Kielce and Bydgoszcz exhibit low prediction error, below 10%. This is also confirmed by the regression value (R = 0.914). In the aggregate, the positive modeling quality indicators strengthen the assumption that network modeling can be used to predict a given category of MSW generation levels in cities in Poland.

Modeling of Waste Generation in Poland
The best-performing network in the experiment contained 10 neurons and was generated in 9 iterations. Other detailed data, including performance validation, or the gradient of error decrease, are presented in Figure 8. The ANN performance validation was assessed using MSE and indicated that the best network was obtained for iteration 3 (Figure 9).  The prediction results for waste generation under the MSW categories: paper and cardboard, glass, plastics and metals, biodegradable, for the cities of Radom, Kielce and Bydgoszcz exhibit low prediction error, below 10%. This is also confirmed by the regression value (R = 0.914). In the aggregate, the positive modeling quality indicators strengthen the assumption that network modeling can be used to predict a given category of MSW generation levels in cities in Poland.

Modeling of Waste Generation in Poland
The best-performing network in the experiment contained 10 neurons and was generated in 9 iterations. Other detailed data, including performance validation, or the gradient of error decrease, are presented in Figure 8. The ANN performance validation was assessed using MSE and indicated that the best network was obtained for iteration 3 (Figure 9).    The network error histogram is shown in Figure 10. The shape of the histogram curve indicates good quality of the trained network and there are no signs of overfitting the network. The results from the regression analysis are displayed in Figure 11. Regression (R) value for training data takes the value of 0.9947, for validation-0.9825, and for test data-0.9887. The correlation results R > 0.95 prove a very good fit of networks with the data. The overall regression is 0.9895. The results from the regression analysis are displayed in Figure 11. Regression (R) value for training data takes the value of 0.9947, for validation-0.9825, and for test data-0.9887. The correlation results R > 0.95 prove a very good fit of networks with the data. The overall regression is 0.9895. The results from the regression analysis are displayed in Figure 11. Regression (R) value for training data takes the value of 0.9947, for validation-0.9825, and for test data-0.9887. The correlation results R > 0.95 prove a very good fit of networks with the data. The overall regression is 0.9895. Figure 11. ANN regression statistics for individual sets and the total set. Output ~= 0.94*Target + 6.1e+05 10 6 All: R=0.98952

Modeling and Prediction of Waste Generation in Poland
Data Fit Y = T Figure 11. ANN regression statistics for individual sets and the total set.

Modeling and Prediction of Waste Generation in Poland
In Figure 12, the waste generation forecasts (total and household waste) are compared with the statistical data for the six-year period between 2015 and 2019. The predictions exhibit a high degree of replication of actual data, which is further confirmed by the results from regression analysis (R = 0.989). After a good quality network was obtained, a Simulink diagram was generated, which determined the future trends for waste generation (total and household waste) for years 2020-2024, as shown in Figure 13. In Figure 12, the waste generation forecasts (total and household waste) are compared with the statistical data for the six-year period between 2015 and 2019. The predictions exhibit a high degree of replication of actual data, which is further confirmed by the results from regression analysis (R = 0.989). After a good quality network was obtained, a Simulink diagram was generated, which determined the future trends for waste generation (total and household waste) for years 2020-2024, as shown in Figure 13.   The waste generation predictions determine a 2% increase in total waste that is expected over the next five years and an approx. 10% increase in the quantity of generated household waste. The projections of future trends are lower in comparison with the forecasts of the European Environmental Agency, which predict an increase in the amount of municipal waste by 5% in five-year periods (1% annually) [4].
Despite the discrepancies, it can be concluded that the presented ANN models display an acceptable level of error and, therefore, emerge as reliable predictors supporting the decision-making processes.
In future studies, more socio-economic factors should be taken into account in order to achieve better results. The waste generation predictions determine a 2% increase in total waste that is expected over the next five years and an approx. 10% increase in the quantity of generated household waste. The projections of future trends are lower in comparison with the forecasts of the European Environmental Agency, which predict an increase in the amount of municipal waste by 5% in five-year periods (1% annually) [4].
Despite the discrepancies, it can be concluded that the presented ANN models display an acceptable level of error and, therefore, emerge as reliable predictors supporting the decision-making processes.
In future studies, more socio-economic factors should be taken into account in order to achieve better results.
The results obtained by us and other researchers showed that ANN can be employed on a country-wide scale with broad possibilities of employing the model.

Discussion
In the study, an attempt was made to create two models: the first predicted the amount of generated waste based on social and economic factors, while the second one predicted the amount of waste in Poland (total waste and household waste) in 2020-2024 based on historical data. The amount of generated waste is affected by behavior and customs of residents. The personal traits include sex, marital status, education and age; the economic traits comprise education and employment; whereas the political influence involves acts and regulations related to waste management [36]. These traits are being collected by the majority of European statistical offices. On the basis of these statistical data, the first ANN model served to predict the amount of generated categorized waste: paper and cardboard, glass, plastics and metals, biodegradable and other. The input data for the model comprised the social and economic factors: population, revenue per capita, the number of entities listed in REGON per 10,000 people and the number of entities by type of business activity (industry/construction). These factors were selected following the prior calculation of correlation coefficients ( Table 4). The correlation analysis indicated that the population and the number of entities by type of business activity (industry/construction) had the greatest influence on the amount of generated waste, whereas the number of entities enlisted in REGON per 10,000 people had the lowest. The ANN network model was devised in order to demonstrate the prediction of generated waste levels in cities on the basis of selected social and economic traits.
On the basis of two indices, R and MSE, the networks with the best fit to the actual data were selected. Regression R value, also known as the coefficient of determination, is a statistical method that explains to what degree the variability of a factor can be caused or explained by its relationship with another factor, whereas MSE is a statistical term that measures how accurately does the sample distribution represent the population using standard deviation. The networks obtained in the study were characterized by R = 0.914 (for categorized waste) and R = 0.989 (for total waste and household waste in Poland in the period of 2020-2024).
In a similar study on predicting the amount of generated municipal waste, Younes et al. selected the best model with MSE = 2.46 and R = 0.97, with gross domestic product, population and employment as the input data [30].
From the study of available literature, it can be seen that such model research is performed increasingly often. The results reported here and by other researchers show that ANN can be employed on a countrywide scale, with broad possibilities of employing the model. Sun and Chungpaibulpatana (2017) predicted the amount of waste in Bangkok. The employed input data included: total MSW, the total number of residents, native residents, native people aged 15-59, total people aged 15-59, number of households, income per households and number of tourists. The R 2 value obtained in their model amounted to 0.96 [22]. In turn, Antanasijević et al. (2013) created an artificial neural network model using the following input data: gross domestic product per capita, domestic material consumption and resource productivity. Their ANN model was used for predicting the amount of waste generated in Bulgaria and Serbia. The obtained model exhibited relative error lower than 10% [37]. Noori et al. (2008) employed ANN for predicting solid waste generation in Tehran for the purpose of conducting short-term, weekly forecasts. The input data for their network included delay (equal to season) and the number of trucks that transported waste during the week, short-term waste amount prediction. Their network exhibited correlation coefficient (R) and average absolute relative error (AARE) in the ANN model obtained as equal to 0.837 and 4.4%, respectively [27].
The presented methodology can be adjusted to any city in Europe, using the statistical data collected in respective offices. Adjusting this methodology may necessitate matching indices and/or parameters on an international level, depending on the specificity of an investigated country.
The flexibility of neural network tools is a feature that allows accounting for a range of additional factors: demographic, economic, geographic, technological, social, administrative, legislative or consumption, which may eventually play an important role in determining the final quantity, type and composition of municipal solid waste.
Employing artificial neural network models in the assessment and forecasting of waste management needs can effectively support decision making aimed to minimize waste production and provide the foundation for the development of modern waste management methods.

Conclusions
The following conclusions emerge from the reported results of the ANN simulations: 1.
The statistical data from 2010-2019 indicate that the levels of generated waste are constantly rising. They are affected to the greatest extent by population and the number of entities by type of business activity (industry/construction), whereas the number of entities enlisted in REGON per 10,000 people has the least notable influence.

2.
The neural network models, generated using the Neural Network library in MatLab and Simulink, show good predictive strength in terms of determining the waste production trends both in the local context of Polish cities (in individual categories: paper and cardboard, glass, plastics and metals, biodegradable and other) and nationwide (total and household waste). The general regression for the first network was equal to 0.914 and for the second network, it was equal to 0.9895. These results determine that the networks may be sound predictors with respect to the tested data.
Derived from the socioeconomic data for 2003-2019, the model predicts that the future waste generation in 2024 will increase by 2%. The devised model can be used for further studies on implementing sustainable solutions in terms of municipal waste management and appropriate technologies of their disposal.