Prediction of Solid Waste Generation Rates in Urban Region of Laos Using Socio-Demographic and Economic Parameters with a Multi Linear Regression Approach

: This paper aims to develop a predictive model for Laos to generate reliable statistics for urban solid waste from 1995 to 2050. The multi-linear regression (MLR) approach is used with six different socio-demographic and economic parameters, i.e., urban population, gross domestic product (GDP) per capita, urban literacy rate, urban poverty incidence, urban household size and urban unemployment rate. Different reliable models are generated under four different scenarios. The value of R 2 (a relative measure of ﬁt) and value of performance indicators (an absolute measure of ﬁt) such as mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) are calculated to assure the validity and accuracy of the results. Model 2 of Scenario 4 is estimated as the best model, where population and GDP per capita show statistical signiﬁcance for estimating urban solid waste generation rate in Laos. The amount of municipal solid waste is estimated to be 0.98 million tons (MT) in the year 2030, 1.26 MT in the year 2040 and 1.52 MT in the year 2050, assuming that the present waste generation trends will be followed in the future. Moreover, the study provides an easy and detailed explanation of the work which will increase the interest of researchers, allow them to understand the MLR approach clearly and inspire them to use it for other developing countries where the scarcity of data is a major obstacle in the ﬁeld of solid waste management. The drawback of the study is the limited availability of historical ofﬁcial and reliable data statistics in Laos for the dependent and independent variables.


Introduction
Most developing countries face serious environmental problems arising from the inappropriate management of solid waste (SW). Developing countries have suffered from low collection rates, illegal dumping, and self-disposal [1]. Along with the informal solid waste management (SWM), open dumping sites of SWs mainly have operated in developing countries and pose serious environmental problems [2]. Likewise, a small landlocked country "Laos" presently faces a colossal problem associated with municipal solid waste (MSW). Recent fast urbanization in Laos has led to a rapid increase in the generation rate of MSW which overpasses the handling capacity of local government. The daily per capita urban solid waste generation rate in Laos showed an increasing rate of 4% from 2000 to 2011 [3].
The characteristic data for SW, however, are so few that the Laos government seems to have difficulty in developing robust country-specific strategies and measures for SWM. Laos, similar to other developing countries, rarely has reliable waste statistics related

Multi-Linear Regression
The multi-linear regression approach is a methodology where the dependent variables are correlated to more than two independent variables to obtain a quantitative and qualitative relation between them. The mathematical presentation of the MLR approach is as follows: Y = bₒ + b₁X₁ + b₂X₂ + b₃X₃ +...….….+ bₙXₙ + ε Y signifies the dependent variable or to be forecasted or predicted variable, bₒ is the constant term X₁, X₂, X₃, Xₙ are the independent variables, b₁, b₂, b₃, bₙ are the regression coefficients of their respective independent variables and ε is the residual error which is assumed to be independent and identically distributed with zero mean and constant variance [20]. The value of the regression coefficients can be positive or negative where the

Multi-Linear Regression
The multi-linear regression approach is a methodology where the dependent variables are correlated to more than two independent variables to obtain a quantitative and qualitative relation between them. The mathematical presentation of the MLR approach is as follows: Y = b 0 + b 1 X 1 + b 2 X 2 + b 3 X 3 +...........+ b n X n + ε Y signifies the dependent variable or to be forecasted or predicted variable, b 0 is the constant term X 1 , X 2 , X 3 , X n are the independent variables, b 1 , b 2 , b 3 , b n are the regression coefficients of their respective independent variables and ε is the residual error which is assumed to be independent and identically distributed with zero mean and constant variance [20]. The value of the regression coefficients can be positive or negative where the positive value signifies the direct relation and the negative value signifies the indirect relation of the independent variable with the dependent variable. The numerical value of the coefficients denotes the one unit change of the value of the dependent variable with the respective independent variable.

Variable Selection
The dependent variable is the outcome variable of this study which is the solid waste generation rate in the urban region of Laos (SWGR). In total, six independent variables are selected for this study. They are urban population, per capita GDP, urban literacy rate, urban poverty incidence, urban household size and urban unemployment rate. Here, the study chooses the independent variables based on the availability of their reliable and comparable official statistics because sometimes it is difficult to obtain reliable data statistics for the desired SDE factors [17]. Nevertheless, many studies have proved that population and solid waste generation rates are directly proportional to each other [21]. The income of a person plays a key role in affecting the solid waste generation amount in any region [22] and here GDP per capita is used as a substitute for "income per capita" due to its non-availability for the official data. Other commonly used independent variables used for this study are literacy rate and household size, which can have both direct as well as indirect effects on SWGR depending on the region [23]. The household size signifies the average number of people living in one house; it is assumed that more people generate more waste [13] but more people can also utilize the same things, i.e., reusing or recycling waste happens in a big family which results in the decrease of solid waste [11]. The literate people are assumed to be knowledgeable or aware of waste management which affects the generation amount. In Laos, overall the poverty ratio has decreased to 23.2% from 33.5% in the past ten years [24]. The decreasing poverty rate increases the quality of life which affects the amount of solid waste [25]. The last independent variable chosen for this study affecting the amount of SWGR is the unemployment rate which is an indirect indication of the economic condition of any country or its region [26]. It is proved as another indicator affecting solid waste. Table 1 provides a detailed description of all the variables selected for this study. It is defined as a household/social unit consisting of one or more persons who live together under the same roof and make common arrangements in the provision of food and other living conditions for themselves irrespective of blood relation or marriage in urban areas.

Independent Variable
Urban Unemployment Rate (Unemployment Rate/UR) Percent (%) It is defined as the rate at which the urban population is without work, currently available for work, seeking work in the recent past period, people who lost their jobs and includes people who voluntarily left their job.
Source: World Bank and Census of Laos. Here, the model is developed for predicting the SWGR for the urban region of Laos using SDE factors with the MLR approach. The reasonable way of choosing the value of independent variables is their availability with time. To perform this study, the priority for the data selection is given to the national reports or official and published documents. The second priority for choosing the data is given to the official reports or data organizations such as World Bank, ADB (Asian Development Bank), JICA (Japan International Cooperation Agency), etc. In the case of the unavailability of the mentioned data sources, the data is obtained from the secondary literature. The World Bank provided the data for the urban population from the year 1995 to 2050 and GDP per capita in Laos from the year 1995 to 2018 [27]. Census of Laos provided the data for literacy rate, household size and the unemployment rate for the year 1995 [28], 2005 [29] and 2015 [30]. The data for the poverty incidence is adopted from a secondary reference for the years 1992, 1997, 2002, 2007 and 2012 [31]. Unfortunately, reliable data is available for all independent variables is for the short time range, i.e., 1995 to 2012. The small data sets are very difficult to study in the MLR approach and they can also result in the wrong interpretation of statistical tests [32]. To overcome this difficulty, the time range selected to enter the historical values was chosen from the year 1995 to the year 2015.

Data Collection for the Dependent Variable
In the year 1992, JICA reported the SWGR as 0.63 kg (kilogram)/capita/day in Vientiane [33]. Later in the year 2005 Khanal et al. [34] reported 0.70 kg/capita/day of solid waste for Vientiane capital city, Glawe et al. [35] [40]. Unfortunately, it is found that there are very limited studies that exist for Laos which show reliable SWGR statistics, especially for the urban region. Moreover, the available data in these references are inconsistent. The only concern of the MLR approach is obtaining false results in the case of using and analyzing improper and incomplete data. To obtain the accurate results for the predictive model, the study has adopted the data for the dependent variable from the recent national report prepared by MONRE (Ministry of Natural Resources and Environment) in the year 2017 which shows that the urban region generated 0.65 kg of waste in the year 2000 and 0.69 kg of waste in the year 2011 on a per capita per-day basis [3]. The daily per capita urban SWGR is estimated as 0.40 kg in 1990, using a secondary reference [41]. The missing data for the urban per capita SWGR is also interpolated and extrapolated (using the same increasing trends from the past) using MS Excel.

Results
The results are obtained for the best predictive model to evaluate the SWGR in the urban region of Laos using the multi-linear regression approach. The study prepares the model by assuming that the present trends of solid waste generation rates will remain the same in the future. To do so, different reliable models are generated in different scenarios in "IBM SPSS Statistics Version 25". The MLR method provides statistically significant or valid results on the condition of satisfying its theory of key and necessary assumptions. It is the first step to know whether the data set can be analyzed using the MLR process or not. The major assumptions of MLR concern checking linearity, normality, multi-collinearity, homoscedasticity and autocorrelation. The violation of these assumptions often results in inaccuracy and imprecision of the developed model because of the generation of error terms related to both DV and IV. In total, results for seven models are presented which are created under four different scenarios for this study. The ability to provide accurate results is indicated by the value of MAE, RMSE and MAPE.
To perform the MLR, all the available dependent and the independent variables were entered in the software in different rows. The missing or unavailable data for the remaining year was interpolated or extrapolated using MS-Excel, 2016 (Microsoft-Excel). The extrapolation is used to create the data for poverty incidence for the years 2013, 2014 and 2015 based on its previous increasing and decreasing trend. The historical values entered for DV and IV are from the year 1995 to the year 2015. The real data is available for per capita solid waste generation rate for the urban region of Laos. The total urban SWGR is calculated in ton per year (TPY) by multiplying the per capita SWGR and population with the factor of 365/1000: SWGR = per capita SWGR*population*365/1000. Table 2 shows the data for all the variables which were entered into the software for running the MLR.

Model Assumptions
The linearity is checked by plotting the scatterplots. The normality of the dependent and independent variables is checked descriptively by obtaining the non-parametric matrix which is a one-sample Kolmogorov-Smirnov test (Table 3) along with a z-test by obtaining the skewness and Kurtosis value ( Table 4). The non-parametric tests result in the rejection or retaining of the null hypothesis of variables and are preferred for small data sets [42]. The z-test signifies the passing or failing of the normality hypothesis which is evaluated by calculating the z-score of skewness (Zs) and z-score of Kurtosis (Zk) whose value range is ±1.96 [43]. According to Table 4, the value of Zs, Zk is greater than 1.96 which rejects the null hypothesis of normality for GDP which is computed to be normal using inverse transformation (IT).  IT is a commonly used transformation for positively skewed variables among IT (1/x), log transformations (ln or log10), cubic root transformations (x 1/3 ) and square root transformation (x 2 ). The best way for choosing the transformation is by testing "which fits the best". The study has one transformed variable, i.e., GDP which is denoted as "InvGDP". The normality descriptive for dependent, independent and transformed variables are shown in Table 4. Table 5 shows the result of the correlation matrix showing the value for beta coefficient, tolerance value, VIF value and p-value for all the models of different scenarios. Scenario1 is performed using the "enter" method and resulted in a high VIF value and low tolerance value for the independent variables. The VIF value and tolerance value define the strength of correlation among the independent variables and are reciprocal to each other (VIF = 1/Tolerance or Tolerance = 1/VIF). The high value of VIF indicates the issue of multi-collinearity in the model. The presence of multi-collinearity jeopardizes the strength of the quantitative relationship between the dependent and the independent variable by poorly estimating the value of beta coefficients. The removal of multi-collinearity assures fewer standard errors and uncertainty in the value of regression coefficients and verifies the validity of the model. The process of removing this issue deals with the removal of independent variables from the model in a cascade manner until the VIF value obtained for IV is less than 10. A VIF value greater than 10 indicates high collinearity among the variables [44].
Scenario 2 provides three models which are resulted from using the "stepwise" method. SPSS has five methods for running the MLR which are enter, stepwise, remove, backward (elimination) and forward (selection). The enter method includes all the independent variables in a single step and forms the regression equation, whereas in the remove method all the entered variables are removed concurrently from the model. The stepwise method includes variables in step-by-step manners and prepares the model by adding or removing the independent variables in progression by checking the statistical significance after each step. All methods are performed in Scenario 2 where the stepwise method and forward method resulted in the development of the same models but the backward method and remove method showed the issue of multi-collinearity with a much higher value for VIF. Table 5 only presents the model results obtained from the stepwise method in Scenario 2. The advantage of using the stepwise approach over the forward, backward and remove method is because it adds the variable at two significance levels: adding variables or removing variables. Moreover, the stepwise method is a combination of the forward method and the backward method [45], as the forward method performs step-up selection and the backward method performs a step-down selection of variables. In total three models are obtained in this approach. Model 1 resulted in variable household sizes and showed VIF value equals to 1, Model 2 includes household size and poverty incidence with a VIF value of 10.980 for both variables, and Model 3 resulted in household size, poverty incidence and literacy rate with a VIF value greater than 10 for each variable; this signifies the issue of multi-collinearity.
The other method of removing the multi-collinearity is by manually removing the highly correlated variables based on the result of person correlation values. The bivariate matrix is presented in Table 6, which indicates the high correlation among the population, literacy rate, household size and unemployment rate. Scenario 3 and Scenario 4 are the results of manual selection of the variables population, InvGDP and poverty incidence for developing the final models. The population is selected as a key indicator among all because many studies have proved that its direct effect on the amount of solid waste generation rates. Scenario 3 is a result of the enter method, where the model includes InvGDP and poverty incidence with a VIF value of 4.812 for both variables. Scenario 4 is a result of stepwise regression. It consists of two models where Model 1 includes population and Model 2 includes population and InvGDP with a VIF value of 6.344. Table 7 provides the result of the model summary and analysis of variance ANOVA matrix for all scenarios. The same value of R 2 and adjusted R 2 shows no change in the model with the addition of new values. It also happens when there is only one variable in the model. The lower value of adjusted R 2 than R 2 shows that model has not improved with the addition of a new variable. The difference in the values of R 2 and adjusted R 2 increases with the addition of non-significant variables to the model. The value closer to 1 signifies the perfect prediction of the model [46]. The other statistical parameters are also shown in Table 7, i.e., standard error of estimates, F and p-value in ANOVA matrix. The significance level is checked at an alpha (α) value of 0.05; the value lower than 0.05 proves the statistical significance of the model. The hypothesis of auto-correlation (error terms are independent of each other) is checked by estimating the Durbin Watson Factor (DWF) at a significance level of 0.05. Statistically, the value of DWF lies between 0 to 4. The value closer to 1 or between 0 to 2 indicates a positive correlation and the value between 2 to 4 indicates a negative correlation among the input variables.
The validity of the prepared model in MLR is assured by the distribution of variances in the model. It is often suggested that verifying the assumption of homoscedasticity or testing the normality of the residuals (the difference between the observed value of the dependent variable and predicted value) is significant for generating the valid model. The non-satisfaction of the assumption of normality of residuals challenges the validity and indicates an inadequacy in the developed model. Figure 2 shows the histogram, p-p (probability-probability) plots, residual plots and scatter plots for the dependent variable for four scenarios. The approximate lying of observation on the straight line in the p-p plot shows the normal distribution of residuals and confirms the validity of the model. Descriptively the normality of residual errors or homoscedasticity is checked by the popular Breusch-Pagan test or White-test. The power issue of the tests could lead to false or unsatisfying results for small sample data sets [47]. It is also difficult to state whether the assumption of homoscedasticity is passed or failed by looking at the graphs with small datasets. Here to check the normal distribution of the residuals, the square of residuals was evaluated with each model and analyzed with their respective independent variables in a linear regression option at a significance level of 0.05. Table 8 shows the result of the assumption of homoscedasticity. A significant value greater than 0.05 in the ANOVA (analysis of variance) matrix indicates the passing of the null hypothesis of normality and states that the residuals of the error terms are normally distributed.  datasets. Here to check the normal distribution of the residuals, the square of residuals was evaluated with each model and analyzed with their respective independent variables in a linear regression option at a significance level of 0.05. Table 8 shows the result of the assumption of homoscedasticity. A significant value greater than 0.05 in the ANOVA (analysis of variance) matrix indicates the passing of the null hypothesis of normality and states that the residuals of the error terms are normally distributed. Figure 2. Histograms, p-p plots, residual plots and scatter plots for the dependent variable. S1A, S2A, S3A and S4A shows the histogram of residuals for the dependent variable for four scenarios respectively; S1B, S2B, S3B and S4B shows the normal p-p plot of observed cumulative probability of residuals of the dependent variable for four scenarios respectively; S1C, S2C, S3C and S4C shows regression standardized predicted value against regression standardized residuals of the dependent variable for four scenarios respectively and S1D, S2D, S3D and S4D shows the scatter plot of regression standardized predicted value against the dependent variable for four scenarios respectively.

Model Accuracy and Validity
To check the accuracy and validity of the developed model popular key performance indicators (KPI) such as mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) are evaluated by using the following equations. MAE= ∑ ( ) Figure 2. Histograms, p-p plots, residual plots and scatter plots for the dependent variable. S1A, S2A, S3A and S4A shows the histogram of residuals for the dependent variable for four scenarios respectively; S1B, S2B, S3B and S4B shows the normal p-p plot of observed cumulative probability of residuals of the dependent variable for four scenarios respectively; S1C, S2C, S3C and S4C shows regression standardized predicted value against regression standardized residuals of the dependent variable for four scenarios respectively and S1D, S2D, S3D and S4D shows the scatter plot of regression standardized predicted value against the dependent variable for four scenarios respectively.

Model Accuracy and Validity
To check the accuracy and validity of the developed model popular key performance indicators (KPI) such as mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) are evaluated by using the following equations.
where, "At" denotes the original or actual data at time t, "Pt" denotes the predicted data at time t, "n" denotes the total number of terms of data points. The indicators are calculated using MS-Excel, 2016. MAE tells the variability in the dataset by evaluating the average distance between each data value and the mean. This method is often used to check the accuracy of predicted values when the units are the same for the original and predicted data. RMSE is an independent measure of fit and is estimated by taking the square root of the averaged square difference between the predicted data and the original data. The squaring of errors in RMSE always gives positive results. Both MAE and RMSE share similarities for their working on the principle that "smaller the error the better the prediction ability". Numerically, RMSE is always greater than or equal to MAE. The estimated values for MAE and RMSE depend on the range of the data and the range of estimation could vary from 0 to ∞. MAPE is another popular and frequently used PVI for measuring the accuracy of predictive models due to its scale-independency and easy interpretability. It takes the average of absolute percentage errors of prediction. The lower value of MAE, RMSE and MAPE indicates better and accurate prediction. The MAPE is limited to provide results for only non-zero values. The table below provides the estimated values for MAE, RMSE and MAPE. Table 9 shows the results of performance indicators obtained for the MAE, RMSE and MAPE. The best model is evaluated based on the results obtained in Tables 7 and 9

SWGR Prediction from 1995 to 2050
It is equally necessary to have accurate and official data for the independent variables to generate the predicted values after generating their mathematical relationship with the dependent variable or output variable to get accurate and reliable results. Here, Figure 3a shows the available and predicted data for the population adopted by the World Bank. Figure 3b presents the original and predicted data for GDP per capita in Laos which is forecasted for this study using MS-Excel 2016 by generating the forecast sheet. Figure 3c presents the original and forecasted results of solid waste generation rates in the urban region of Laos and Figure 3d shows the combined results of Figure 3a-c. 2050. The increasing urban population and increasing GDP per capita will tend to increase the total amount of solid waste in the urban region of Laos. region of Laos and Figure 3d shows the combined results of Figure 3a-c. Figure 3d shows an increasing trend in the SWGR with the increasing population and increasing GDP. The urban solid waste generation rate of Laos will reach 0.98 million tons (MT) in the year 2030, 1.26 MT in the year 2040 and 1.52 MT in the year 2050. The per capita SWGR for the urban region of Laos is estimated to be 0.74 kg in the year 2030, 0.75 kg in the year 2040 and 0.76 kg in the year 2050 on daily basis. The average rate of growth for per capita solid waste generation rate is estimated to be around 17% from the year 2020 to 2050. The increasing urban population and increasing GDP per capita will tend to increase the total amount of solid waste in the urban region of Laos.

Discussion
The existing solid waste management system in Laos is inefficient and fruitless. Currently, at the national level, five major ministries are working for the management of solid waste in the country, i.e., MPWT (Ministry of Public Works and Transport), MONRE, MOIC (Ministry of Industry and Commerce), MOH (Ministry of Health), MAF (Ministry

Discussion
The existing solid waste management system in Laos is inefficient and fruitless. Currently, at the national level, five major ministries are working for the management of solid waste in the country, i.e., MPWT (Ministry of Public Works and Transport), MONRE, MOIC (Ministry of Industry and Commerce), MOH (Ministry of Health), MAF (Ministry of Agriculture and Forestry) and MEM (Ministry of Energy and Mines). The MPWT supports and provides the technical guidelines and advice on planning, designing and management of SW infrastructure, the MOIC manages the industrial and hazardous waste, MOH manages the hospital and health sector waste and MAF manages the compost made from the household. The MONRE was established in the year 2011 to provide the regulatory framework for SWM, developing the strategies, policies and guidelines for SWM in the country. There are three reports prepared by MONRE in the year 2015 [48], 2016 [49] and 2017 [3] mentioning the need to manage the solid waste in Laos. Moreover, there is no separate document prepared at the central and local levels regarding any policy for managing the SW in the country.
The lack of reliable statistics is the biggest disadvantage and acts as a barrier to proper solid waste management in any country [1]. It is very challenging for the local government or policymakers to draw any useful conclusion for the proper handling or management of solid waste without any related and informative statistics of solid waste [6,11]. This study prepared the predictive model with the ability to predict the data for the past, present and future years. Different models are generated under different scenarios using the multi-linear regression (MLR) approach. The urban solid waste generation rate is correlated to six independent socio-demographic and economic parameters. The population and GDP per capita were found to be most effective for developing a predictive model for urban SWGR in Laos. It is found that the growing urban population and growing economy will lead to an increase in the amount of SWGR in the urban region of Laos. The SWGR will reach 0.98 Mt in the year 2030 with a 3.50 million population and USD (United States Dollar) 4229 of GDP per capita. This amount of waste generation for the urban region will increase to 1.5 Mt in the year 2050 with 5.2 million of population and GDP per capita of USD 6998. The validity and accuracy of the model are estimated by the value of R square, adjusted R square with key performance indicators, i.e., MAE, RMSE and MAPE. Among seven models of four scenarios, Model 2 of Scenario 4 is predicted as the best model with R 2 value as 0.999, adjusted R 2 as 0.999, MAE value as 1886.40, RMSE value as 3030.74 and MAPE value as 0.67. Moreover, this study also shows the predicted results for the input variables, i.e., population and GDP per capita up to the year 2050.
The study highlights the implications in the context of developing countries. The need for peaceful waste management is required in this region of the world, as they are trying to be at par with the developed nations. The implication of these simple steps could be magnified in processing and developing the waste system in this country. The trial of estimation of these factors for future waste generation may indeed beneficial for the government to make a sustainable waste management policy.

Conclusions
The predicted data for urban population, GDP per capita and urban solid waste generation rates will help the policymakers and the government to create a proper sustainable solid waste management policy in the country. It also provides future scope for researchers working on solid waste management in Laos. The easy explanation of the work will urge the researchers to use this technique for developing the solid waste management database in developing countries. The limitation of this study is the non-availability of enough data for developing the model. In the future, a large dataset and a wide range of socio-demographic and economic parameters can be considered to further improve the scope of the methodology and developed models.