LSTM-Based Forecasting for Urban Construction Waste Generation

Accurate forecasts of construction waste are important for recycling the waste and formulating relevant governmental policies. Deficiencies in reliable forecasting methods and historical data hinder the prediction of this waste in long- or short-term planning. To effectively forecast construction waste, a time-series forecasting method is proposed in this study, based on a three-layer long short-term memory (LSTM) network and univariate time-series data with limited sample points. This method involves network structure design and implementation algorithms for network training and the forecasting process. Numerical experiments were performed with statistical construction waste data for Shanghai and Hong Kong. Compared with other time-series forecasting models such as ridge regression (RR), support vector regression (SVR), and back-propagation neural networks (BPNN), this paper demonstrates that the proposed LSTM-based forecasting model is effective and accurate in predicting construction waste generation.


Introduction
As urbanization and transformation of old urban districts progress, the volumes of construction waste that are generated constantly increase [1]. Statistics show that the total annual disposal volume of construction waste in China has ranged from approximately 1.55 to 2.4 billion tons in recent years, accounting for approximately 40% of the total volume of urban solid waste. Construction waste disposal in China is relatively extensive, with landfilling and dumping as the primary disposal approaches. Research indicates that with effective planning and technical measures, most construction waste can be reused as renewable resources. However, thoughtless disposal will produce serious environmental issues, including air, land, and water pollution. These issues will pose a threat to human health and generate waste of potentially recyclable resources.
Recycling of construction waste is considered an effective means to digest urban waste. Methods for quantifying construction waste are the basis for management of its recycling. Forecasting of trends and variations of construction waste has significance because it enables a government to estimate the landfill capacity requirements in advance and formulate relevant policies [2]. In 2014, Wu et al. [3] broadly reviewed the literature on construction waste management. They noted that relatively few studies had been conducted to predict construction waste generation, the lack of support information Compared with the other methods, the advantage of LSTM is learning long-term dependencies.
For an LSTM network, information storage and interaction in the forward-propagation process are controlled by the three thresholds in the hidden-layer cell structure. The forget gate ( ) controls the unit state information at the previous time point that needs to be discarded. The input gate ( ) controls the information-input process at the current time point. The output gate ( ) controls the filtered output of the current unit state.
The three thresholds are calculated using the following equations [14]: The update equation for the cell state information for the current time point is expressed as: Based on the combined actions of the and , the cell state at the current time point ( ) can be represented as follows: The information returned to the hidden layer (ℎ ) is: where is the input vector, ℎ is the output vector, is the cell state, and ℎ are the and ℎ activation functions, respectively, and and are the corresponding weight and deviation matrices, respectively.
The backpropagation-through-time algorithm [15] is employed to train an LSTM model in the following steps: Compared with the other methods, the advantage of LSTM is learning long-term dependencies.
For an LSTM network, information storage and interaction in the forward-propagation process are controlled by the three thresholds in the hidden-layer cell structure. The forget gate ( f t ) controls the unit state information at the previous time point that needs to be discarded. The input gate (i t ) controls the information-input process at the current time point. The output gate (o t ) controls the filtered output of the current unit state.
The three thresholds are calculated using the following equations [14]: The update equation for the cell state information for the current time point is expressed as: Based on the combined actions of the f t and i t , the cell state at the current time point (C t ) can be represented as follows: The information returned to the hidden layer (h t ) is: where x is the input vector, h is the output vector, C is the cell state, σ and tanh are the sigmoid. and tanh. activation functions, respectively, and W and b are the corresponding weight and deviation matrices, respectively. The backpropagation-through-time algorithm [15] is employed to train an LSTM model in the following steps: Step 1: Use Equations (1)-(4) to calculate the output value of the forward propagation; Step 2: Inversely calculate the error term for each LSTM cell, including the longitudinal propagation between the layers and the temporal transverse propagation; Step 3: Calculate the gradient of each weight, based on the corresponding error term; Step 4: Use a gradient-based optimization algorithm to update the weights.

LSTM Network-Based Construction Waste Generation Forecasting Model
In this study, an LSTM model was constructed based on the characteristics of univariate time-series data with limited sample points and the simple RNN design principle. Figure 2 shows the total framework based on a typical three-layer LSTM structure, which involves four main modules of data preprocessing, network training, network prediction and model evaluation. Relevant steps of LSTM time-series prediction can be described briefly as follows: Step 1. Preprocess historical data and split it into training set and test set; Step 2. Construct the model construction and train LSTM network based on training set; Step 3. Make predictions based on test set; Step 4. Evaluate the model accuracy.
Sustainability 2020, 12, x FOR PEER REVIEW 4 of 12 Step 2: Inversely calculate the error term for each LSTM cell, including the longitudinal propagation between the layers and the temporal transverse propagation; Step 3: Calculate the gradient of each weight, based on the corresponding error term; Step 4: Use a gradient-based optimization algorithm to update the weights.

LSTM Network-Based Construction Waste Generation Forecasting Model
In this study, an LSTM model was constructed based on the characteristics of univariate timeseries data with limited sample points and the simple RNN design principle. Figure 2 shows the total framework based on a typical three-layer LSTM structure, which involves four main modules of data preprocessing, network training, network prediction and model evaluation. Relevant steps of LSTM time-series prediction can be described briefly as follows: Step1. Preprocess historical data and split it into training set and test set; Step2. Construct the model construction and train LSTM network based on training set; Step3. Make predictions based on test set; Step4. Evaluate the model accuracy. During data preprocessing, to make inputs conform to the LSTM structure, the investigated time series can be transformed to a supervised learning problem by creating inputs of delayed observations and labels of forecasts. For a given lag step k (which equals to the number of neurons in input layer of LSTM structure), the input vector ( , , ⋯ , ) at period t is used to calculate the forecasting value . Network training is aimed at a trained LSTM network and its forward calculation f by optimizing loss function of mean square error (MSE). The final forecasting model can be simply expressed as follows: Next, an iterative manner is adopted in prediction process, which means that each new forecast on test set will be taken to create new input vector to obtain the next forecast until all forecasts are collected. Lastly, all forecasts are taken to compute model accuracy with original observations on test set.

Numerical Experiments
In this study, a forecasting model was constructed based on the historical construction waste generation data for two cities, namely, Shanghai and Hong Kong. Note that a change was made in Shanghai's statistical standards for construction waste generation. As a result, it is unreasonable to forecast using all historical data for Shanghai. However, this study is focused on the forecasting of urban construction waste generation instead of construction waste classification.

Data
Training set Test set LSTM structure

Data preprpcessing
Trained LSTM network During data preprocessing, to make inputs conform to the LSTM structure, the investigated time series can be transformed to a supervised learning problem by creating inputs of delayed observations and labels of forecasts. For a given lag step k (which equals to the number of neurons in input layer of LSTM structure), the input vector (y t−k , y t−k+1 , · · · , y t...1 ) at period t is used to calculate the forecasting valueŷ t .
Network training is aimed at a trained LSTM network and its forward calculation f by optimizing loss function of mean square error (MSE). The final forecasting model can be simply expressed as follows:ŷ t = f (y t−k , y t−k+1 , · · · , y t−1 ) Next, an iterative manner is adopted in prediction process, which means that each new forecast on test set will be taken to create new input vector to obtain the next forecast until all forecasts are collected. Lastly, all forecasts are taken to compute model accuracy with original observations on test set.

Numerical Experiments
In this study, a forecasting model was constructed based on the historical construction waste generation data for two cities, namely, Shanghai and Hong Kong. Note that a change was made in Shanghai's statistical standards for construction waste generation. As a result, it is unreasonable to Sustainability 2020, 12, 8555 5 of 12 forecast using all historical data for Shanghai. However, this study is focused on the forecasting of urban construction waste generation instead of construction waste classification.

Data Collection and Outlier Elimination
We collect two datasets of construction waste data from Shanghai and Hong Kong, respectively, for the experiments. The construction waste generation data from Shanghai statistical yearbook (http://tjj.sh.gov.cn/tjnj/nj18.htm?d1=2018tjnj/C0618.htm) for the pre-2012 period (a total of 32 years), which is denoted by dataset A, exhibit complex time-series features with substantial volume. The dataset B is the annual construction waste generation data for the 31-year period from 1986 to 2016, which is collected from the Waste Reduction Office in the Environmental Protection Department of Hong Kong (https://www.wastereduction.gov.hk/en/assistancewizard/waste_red_sat.htm#top).
Outliers refer to data points beyond the normal value range and generally identified by scatter or box plots. Due to the relatively small sample size of the datasets selected in this study, correcting the outliers can facilitate the use of the available data information and prevent an insufficient sample size caused by eliminating outliers. The outliers in the datasets A and B were treated by mean-value interpolation, as shown in Figure 3.

Data Collection and Outlier Elimination
We collect two datasets of construction waste data from Shanghai and Hong Kong, respectively, for the experiments. The construction waste generation data from Shanghai statistical yearbook (http://tjj.sh.gov.cn/tjnj/nj18.htm?d1=2018tjnj/C0618.htm) for the pre-2012 period (a total of 32 years), which is denoted by dataset A, exhibit complex time-series features with substantial volume. The dataset B is the annual construction waste generation data for the 31-year period from 1986 to 2016, which is collected from the Waste Reduction Office in the Environmental Protection Department of Hong Kong (https://www.wastereduction.gov.hk/en/assistancewizard/waste_red_sat.htm#top).
Outliers refer to data points beyond the normal value range and generally identified by scatter or box plots. Due to the relatively small sample size of the datasets selected in this study, correcting the outliers can facilitate the use of the available data information and prevent an insufficient sample size caused by eliminating outliers. The outliers in the datasets A and B were treated by mean-value interpolation, as shown in Figure 3.

Data Segmentation
In this study, each time series was divided into a training set and a test set based on the time sequence by hold-out cross-validation, and the subsets of the training set (segmented based on the time sequence) were preserved to examine the model performance, as shown in Figure 4 [16]. Due to the slight difference in sample sizes between datasets A and B, to facilitate the comparison of model performance, the sample sizes of both the test sets were set to 5 (i.e., data for the last five years). Thus, the first 27 data points (pre-2007 data) in dataset A and the first 26 data points (pre-2016 data) in dataset B were selected to form training sets.

Data Segmentation
In this study, each time series was divided into a training set and a test set based on the time sequence by hold-out cross-validation, and the subsets of the training set (segmented based on the time sequence) were preserved to examine the model performance, as shown in Figure 4 [16]. Due to the slight difference in sample sizes between datasets A and B, to facilitate the comparison of model performance, the sample sizes of both the test sets were set to 5 (i.e., data for the last five years).

Data Normalization
Data normalization can eliminate the difference in dimension among input data and increase the computational speed of the model. In this study, the input data were mapped by min-max normalization [17] to the interval of [0, 1], which can be represented by the following equation where is the input datum, * is the normalized datum, and and are the maximum and minimum values, respectively, of the input data. Denormalizing the output of the forecasting model is necessary to enable it to fall within the actual range and be consistent with the actual significance.

Evaluation Indices
To reduce the limitation associated with the use of a single index to evaluate the model performance, the following three indices were employed in this study to examine the forecasting models comprehensively.
Mean absolute error (MAE) was used to evaluate the closeness between the actual values and predicted values.
Mean absolute percentage error (MAPE) was used to evaluate the relative error, which can be used to compare the forecast performance for various dataset ranges.
Root mean square error (RMSE) is highly sensitive to extremely large or small errors, and therefore can satisfactorily reflect the forecast accuracy.
Here, and are the actual and predicted values, respectively, output by the model at the time t.

Comparison Models
The following three models were selected for experimental comparative analysis.

Data Normalization
Data normalization can eliminate the difference in dimension among input data and increase the computational speed of the model. In this study, the input data were mapped by min-max normalization [17] to the interval of [0, 1], which can be represented by the following equation: y * = (y − y max )/(y max − y min ), where y is the input datum, y * is the normalized datum, and y max and y min are the maximum and minimum values, respectively, of the input data. Denormalizing the output of the forecasting model is necessary to enable it to fall within the actual range and be consistent with the actual significance.

Evaluation Indices
To reduce the limitation associated with the use of a single index to evaluate the model performance, the following three indices were employed in this study to examine the forecasting models comprehensively.
Mean absolute error (MAE) was used to evaluate the closeness between the actual values and predicted values.
Mean absolute percentage error (MAPE) was used to evaluate the relative error, which can be used to compare the forecast performance for various dataset ranges.
Root mean square error (RMSE) is highly sensitive to extremely large or small errors, and therefore can satisfactorily reflect the forecast accuracy.
Here, y t andŷ t are the actual and predicted values, respectively, output by the model at the time t.

Comparison Models
The following three models were selected for experimental comparative analysis. RR is an approach that adds L1 regularization to a general linear regression model to prevent overfitting. The RR time-series forecasting model is structurally similar to MLR [18] and can be represented by the equation: where Y t is the predicted value at time t, a 1 , a 2 , · · · , a k are the regression coefficients of Y t−1 , Y t−2 , · · · , Y t−k , respectively, at various time points and a 0 and e are the bias and error terms, respectively. In this study, the regularization coefficient α of the RR model was set to 0.5. Additionally, the step size of the input series (i.e., k) was selected by trial and error. SVR SVR determines the nonlinear mapping relations between low-dimensional data and output indices by regression after mapping the low-dimensional data to a high-dimensional space via a nonlinear kernel function [19]. In this study, the Gaussian-radial basis function was selected as a nonlinear kernel function. The corresponding penalty coefficient C and kernel function coefficient γ were set to certain values, and k was also determined.

ANN
Three-layer BPNNs exhibit excellent performance in approximating nonlinear data [20,21]. Thus, a three-layer BPNN was employed in this study. The sigmoid activation function was applied between the input layer and the hidden layer and between the hidden layer and the output layer. The network was trained using the gradient descent with momentum algorithm. The learning rate was set to 0.1. The momentum parameters were set to the following values: momentum = 0.9 and Nesterov = true. The random seeds for network initialization, maximum number of iterations, and expected error were set to 0, 2000, and 1 × 10 −6 , respectively. In this study, point-by-point forecasting was performed for the test-set data. Therefore, the number of neurons in the output layer was set to 1. Additionally, the number of neurons in the input layer (i) and the number of neurons in the LSTM hidden layer (n) were determined by trial and error (i = k).

Analysis of Model Parameters
First, dataset A is used as an example. The training set of the time-series data for construction waste generation was normalized using the proposed method. Additionally, an LSTM-based forecasting model was constructed. The numbers of nodes in the input and hidden layers of a three-layer LSTM model exert an extremely significant impact on the network scale. The selected optimizer also affects the convergence rate of the network during the training process. Thus, n and k were determined by trial and error to be 64 and 2, respectively. An adaptive-gradient optimizer was selected [22,23] (default parameters were preserved). For the non-key model parameters, the random seeds for network initialization, maximum number of iterations, and expected error were set to 0, 2000, and 1 × 10 −6 , respectively.
Due to the relatively large n value, the model is relatively complex, which may cause overfitting and relatively poor generalization performance. In this study, the dropout method [24] was employed. Specifically, a dropout layer was added after the LSTM hidden layer to randomly invalidate some neurons and prevent them from updating by forward propagation and backpropagation, which prevented overfitting to a certain extent. In this study, an optimal dropout rate was determined by comparing the MSE values on the test set with respect to various dropout rates, as shown in Figure 5. The MSE on the test set was the largest before adding a dropout layer to the model. At a dropout rate of 2, the MSE for the test-set data was the smallest, and the optimal training performance was achieved on the test set.  The experimental process for dataset B was similar to that for dataset A. The parameters of the LSTM-based model were adjusted based on experience. The k and n were set to 4 and 14, respectively. A root-mean-square propagation (RMSProp) optimizer was selected (default parameters were preserved). The non-key parameters were set to the same values selected for the non-key parameters for dataset A. For dataset B, because the n value was moderate, no dropout layer was added to the network structure.

Forecasts and Comparative Analysis
First, dataset A is used as an example. The network was trained using the previously determined LSTM network structure and parameters. Values fitted to the training set were obtained. Additionally, point-by-point recursive forecasting was performed on the test set. The corresponding predicted values (for the period 2007-2011) were obtained. The fitting and forecast performance of various time-series forecasting models were comparatively analyzed using the evaluation indices described in Section 5.1.2, as shown in Table 2.  To examine its forecast effectiveness and accuracy, the LSTM-based model was compared with various time-series forecasting models. Table 1 summarizes the parameter settings for each model. Additionally, to determine the advantages of the LSTM-based model among RNNs, the hidden-layer cells in the LSTM-based model were replaced by the RNN structure, and experiments were conducted using the same parameters. The experimental process for dataset B was similar to that for dataset A. The parameters of the LSTM-based model were adjusted based on experience. The k and n were set to 4 and 14, respectively. A root-mean-square propagation (RMSProp) optimizer was selected (default parameters were preserved). The non-key parameters were set to the same values selected for the non-key parameters for dataset A. For dataset B, because the n value was moderate, no dropout layer was added to the network structure.

Forecasts and Comparative Analysis
First, dataset A is used as an example. The network was trained using the previously determined LSTM network structure and parameters. Values fitted to the training set were obtained. Additionally, point-by-point recursive forecasting was performed on the test set. The corresponding predicted values (for the period 2007-2011) were obtained. The fitting and forecast performance of various time-series forecasting models were comparatively analyzed using the evaluation indices described in Section 5.1.2, as shown in Table 2. Figures 6 and 7 display the performance of the LSTM-based model and each comparison model in terms of the predicted value and error. As demonstrated in Figure 6, the forecast curves of the RR, SVR, and BPNN models tend to be stable and unable to accurately predict the data for the fifth test point, which produces relatively large prediction errors. In comparison, the LSTM-based model exhibits relatively excellent forecast performance. The prediction error of each model for the fourth data point is relatively large. However, the forecast curve of the LSTM-based model is closer to the variation trend in the actual data. Figures 6 and 7 display the performance of the LSTM-based model and each comparison model in terms of the predicted value and error. As demonstrated in Figure 6, the forecast curves of the RR, SVR, and BPNN models tend to be stable and unable to accurately predict the data for the fifth test point, which produces relatively large prediction errors. In comparison, the LSTM-based model exhibits relatively excellent forecast performance. The prediction error of each model for the fourth data point is relatively large. However, the forecast curve of the LSTM-based model is closer to the variation trend in the actual data. Figure 7 further compares the LSTM-based model and other forecasting models in terms of performance on the test set. The LSTM-based model is employed as an example. The values of the evaluation indices for the LSTM-based model are lower than those for other forecasting models, which suggests that the proposed LSTM-based model satisfactorily tracks the actual data in the test set and achieves relatively ideal forecast performance.  Dataset A has low timeliness. The data for the following six years (2012-2017) predicted by the LSTM-based model were converted using a conversion factor (i.e., ratio of the mean construction waste generation (excluding the construction waste soil generation) in the period 1980-2011 to the mean construction waste generation (including the construction waste soil generation) in the period 2012-2017). Table 3 show the forecasting result for the years 2012-2017, and the result was compared with the trend of construction waste generation (including the construction waste soil generation) in the period 2012-2017. The correlation coefficient ( ) between the two trendlines is 0.87. This finding indirectly demonstrates that the LSTM-based model is effective in forecasting.  Figure 6, the forecast curves of the RR, SVR, and BPNN models tend to be stable and unable to accurately predict the data for the fifth test point, which produces relatively large prediction errors. In comparison, the LSTM-based model exhibits relatively excellent forecast performance. The prediction error of each model for the fourth data point is relatively large. However, the forecast curve of the LSTM-based model is closer to the variation trend in the actual data. Figure 7 further compares the LSTM-based model and other forecasting models in terms of performance on the test set. The LSTM-based model is employed as an example. The values of the evaluation indices for the LSTM-based model are lower than those for other forecasting models, which suggests that the proposed LSTM-based model satisfactorily tracks the actual data in the test set and achieves relatively ideal forecast performance.     Table 3 show the forecasting result for the years 2012-2017, and the result was compared with the trend of construction waste generation (including the construction waste soil generation) in the period 2012-2017. The correlation coefficient (R) between the two trendlines is 0.87. This finding indirectly demonstrates that the LSTM-based model is effective in forecasting. The applicability of the proposed method to forecast construction waste generation in other cities was further examined based on dataset B. Table 4 and Figure 8 show the experimental results, which demonstrate that the proposed LSTM-based model outperforms the comparison models in terms of forecast accuracy. The applicability of the proposed method to forecast construction waste generation in other cities was further examined based on dataset B. Table 4 and Figure 8 show the experimental results, which demonstrate that the proposed LSTM-based model outperforms the comparison models in terms of forecast accuracy.  According to the aforementioned case study of two datasets, the proposed LSTM model performs well for obtaining more accurate forecasts than four benchmarking models. Our proposed model aims to solve univariate time series forecasting problem with nonlinearity and non-stationarity due to the lack of multivariate data of influencing factors. However, in practice, our model can be used in multivariate time series prediction problem by slightly changing inputs to LSTM structure, and its effect can be discussed in the further research.

Conclusions
The forecasting of construction waste generation enables local governments to manage construction waste landfill and formulate construction waste management policies. However, a review of the available literature indicates that the methods for this type of forecasting and their According to the aforementioned case study of two datasets, the proposed LSTM model performs well for obtaining more accurate forecasts than four benchmarking models. Our proposed model aims to solve univariate time series forecasting problem with nonlinearity and non-stationarity due to the lack of multivariate data of influencing factors. However, in practice, our model can be used in multivariate time series prediction problem by slightly changing inputs to LSTM structure, and its effect can be discussed in the further research.

Conclusions
The forecasting of construction waste generation enables local governments to manage construction waste landfill and formulate construction waste management policies. However, a review of the available literature indicates that the methods for this type of forecasting and their accuracy require improvements.
Available data for urban construction waste generation have the characteristics of small volumes and high nonlinearities. A three-layer LSTM network is proposed in this study for forecasting based on univariate time series with limited sample points. The proposed model is compared with regression and neural-network models. The results show that the LSTM-based model is highly effective in solving univariate nonlinear forecasting problems. Additionally, a dropout layer is added to effectively address the overfitting problem of the LSTM-based model and improve its generalization performance.
The applicability of the LSTM-based model to forecasting construction waste generation is demonstrated. The scope of the deep-learning technique is expanded. This study has theoretical and practical significance in quantifying and managing construction waste. Obtaining construction waste data for cities in China is difficult. As a result, the proposed method is validated based on data for only Shanghai and Hong Kong. In the future, the proposed method can be used to predict construction waste generation in other cities and solve forecasting problems in other fields. Additionally, the application performance of various network-training optimization algorithms, such as Adam and RMSProp, can be compared.