Comparison of Random Forest and Gradient Boosting Machine Models for Predicting Demolition Waste Based on Small Datasets and Categorical Variables

Construction and demolition waste (DW) generation information has been recognized as a tool for providing useful information for waste management. Recently, numerous researchers have actively utilized artificial intelligence technology to establish accurate waste generation information. This study investigated the development of machine learning predictive models that can achieve predictive performance on small datasets composed of categorical variables. To this end, the random forest (RF) and gradient boosting machine (GBM) algorithms were adopted. To develop the models, 690 building datasets were established using data preprocessing and standardization. Hyperparameter tuning was performed to develop the RF and GBM models. The model performances were evaluated using the leave-one-out cross-validation technique. The study demonstrated that, for small datasets comprising mainly categorical variables, the bagging technique (RF) predictions were more stable and accurate than those of the boosting technique (GBM). However, GBM models demonstrated excellent predictive performance in some DW predictive models. Furthermore, the RF and GBM predictive models demonstrated significantly differing performance across different types of DW. Certain RF and GBM models demonstrated relatively low predictive performance. However, the remaining predictive models all demonstrated excellent predictive performance at R2 values > 0.6, and R values > 0.8. Such differences are mainly because of the characteristics of features applied to model development; we expect the application of additional features to improve the performance of the predictive models. The 11 DW predictive models developed in this study will be useful for establishing detailed DW management strategies.


Introduction
Waste management has become a critical issue due to rapid urban growth [1,2]. According to recent statistics, 2.01 billion tons of municipal solid waste (MSW) was generated in 2016; reports also predict that 3.40 billion tons of waste will be generated annually by 2050 [3]. In particular, construction and demolition (C&D) waste generation is steadily increasing [4][5][6], and 70-90% of C&D waste generation can be attributed to demolition activities [7,8]. For example, construction activities are responsible for 90% of the C&D waste generation in the United States; moreover, such activities account for 74% of the C&D waste generation in China [6][7][8]. C&D waste is mainly generated from such activities as construction, demolition, and refurbishment, which are detrimental to the environment because they generate considerable quantities of waste and greenhouse gases [9]. Therefore, systems or solutions for effectively and appropriately managing C&D waste are necessary for sustainable growth and development in the construction industry.

Description of Artificial Intelligence in Predicting DW Generation in This Study
In this section, we explore the principles and properties of random forest (RF) and gradient boosting machine (GBM) algorithms along with the properties of ensembles. Moreover, we explore LOOCV techniques applied for the performance evaluation of models based on small datasets mainly comprising categorical variables used in this study.

Ensemble Model
Ensemble learning (EL) methods involve combining and building various learning algorithms. This allows ensemble learning algorithms (ELA) to obtain better predictive performance and improved generalization compared to a single learning algorithm [27,28]. Ensemble methods are especially useful when the amount of training data is small. This is because ensemble algorithms (EAs) can reduce the risk of selecting a poor classifier through votes by individual classifiers [29]. Various ensemble techniques have been developed so far, among which bagging and boosting are representative of decision tree (DT)-based ELAs [30,31]. As presented in Figure 1, bagging (also known as bootstrap aggregation [32]) generates numerous bootstraps from the given training data, and an independent predictive model is generated for each bootstrap. Thus, bagging can improve the stability and accuracy of machine learning algorithms (MLA) [33]. Recently, RF has been widely utilized as a bagging method for MLA; Cha et al. and Nguyen et al. [23,34] demonstrated high predictive performance in applying RF to predict waste generation. Boosting, on the other hand, is a technique in which numerous classifiers are generated from early samples, and weak classifiers are collected to generate strong classifiers. As seen in Figure 1, bagging is an independence-based learner training system, while boosting is an iterative and dependence-based system. Boosting is a continuous process of generating classifiers reinforced by weights from weak classifiers from previous stages, which helps reduce the bias and variance of datasets [32]. GBM is representative of boosting DT-based algorithms. Johnson et al. [35] and Kontokosta et al. [36] applied GBM to predict the amount of MSW generation.

Description of Artificial Intelligence in Predicting DW Generation in This Study
In this section, we explore the principles and properties of random forest (RF) and gradient boosting machine (GBM) algorithms along with the properties of ensembles. Moreover, we explore LOOCV techniques applied for the performance evaluation of models based on small datasets mainly comprising categorical variables used in this study.

Ensemble Model
Ensemble learning (EL) methods involve combining and building various learning algorithms. This allows ensemble learning algorithms (ELA) to obtain better predictive performance and improved generalization compared to a single learning algorithm [27,28]. Ensemble methods are especially useful when the amount of training data is small. This is because ensemble algorithms (EAs) can reduce the risk of selecting a poor classifier through votes by individual classifiers [29]. Various ensemble techniques have been developed so far, among which bagging and boosting are representative of decision tree (DT)-based ELAs [30,31]. As presented in Figure 1, bagging (also known as bootstrap aggregation [32]) generates numerous bootstraps from the given training data, and an independent predictive model is generated for each bootstrap. Thus, bagging can improve the stability and accuracy of machine learning algorithms (MLA) [33]. Recently, RF has been widely utilized as a bagging method for MLA; Cha et al. and Nguyen et al. [23,34] demonstrated high predictive performance in applying RF to predict waste generation. Boosting, on the other hand, is a technique in which numerous classifiers are generated from early samples, and weak classifiers are collected to generate strong classifiers. As seen in Figure  1, bagging is an independence-based learner training system, while boosting is an iterative and dependence-based system. Boosting is a continuous process of generating classifiers reinforced by weights from weak classifiers from previous stages, which helps reduce the bias and variance of datasets [32]. GBM is representative of boosting DT-based algorithms. Johnson et al. [35] and Kontokosta et al. [36] applied GBM to predict the amount of MSW generation.

Gradient Boosting Machine and Random Forest
GBM, as one of the most robust machine learning (ML) algorithms, is widely applied in engineering fields [37] and is a boosting technique. GBM may be deemed a numerical optimization algorithm aimed at finding an additive model that minimizes the loss function. To this end, GBM iteratively adds a new DT (weak classifier in Figure 2), which can reduce the loss function as much as possible at each stage. In other words, at each step, a new DT is fitted to the current residual and added to the previous model to update the residual. As presented in Figure 2, GBM is an iterative and dependence-based algorithm;

Gradient Boosting Machine and Random Forest
GBM, as one of the most robust machine learning (ML) algorithms, is widely applied in engineering fields [37] and is a boosting technique. GBM may be deemed a numerical optimization algorithm aimed at finding an additive model that minimizes the loss function. To this end, GBM iteratively adds a new DT (weak classifier in Figure 2), which can reduce the loss function as much as possible at each stage. In other words, at each step, a new DT is fitted to the current residual and added to the previous model to update the residual. As presented in Figure 2, GBM is an iterative and dependence-based algorithm; as it iteratively performs these processes, it increasingly reinforces the classifier by the number of iterations specified by the user. GBM algorithms based on this boosting principle are effective in reducing bias and variance in predictive models [38]. Such characteristics of GBM are deemed useful in solving issues of bias and variance in predictive model results that occur when ML algorithms are applied to small datasets. Therefore, this study utilized GBM to predict demolition waste (DW) generation in a small-dataset environment consisting mainly of categorical variables. as it iteratively performs these processes, it increasingly reinforces the classifier by the number of iterations specified by the user. GBM algorithms based on this boosting principle are effective in reducing bias and variance in predictive models [38]. Such characteristics of GBM are deemed useful in solving issues of bias and variance in predictive model results that occur when ML algorithms are applied to small datasets. Therefore, this study utilized GBM to predict demolition waste (DW) generation in a small-dataset environment consisting mainly of categorical variables. RF is considered among the 10 best classifiers [39]. RF, a representative DT-based algorithm, is a bagging-based ensemble technique that generates bootstrap sampling (Figure 2). RF builds numerous subsets (bootstrap sampling) from the training data and trains the same algorithm several times. The resulting prediction is determined as the mean of all predictions of the submodels. As the number of trees increases, RF can avoid overfitting and be less affected by outliers. Moreover, even when the class is imbalanced, it has superior predictive power over other ML algorithms [33]. Considering these strengths, the application of RF in the data environment utilized in this study (small datasets composed mainly of categorical variables) is also expected to be useful for predicting DW generation.

Leave-One-Out Cross-Validation (LOOCV)
K-fold and leave-one-out cross-validation (LOOCV) are widely used to evaluate the performance of classification algorithms. For large amounts of data, k-fold cross-validation (CV) should be applied to assess the accuracy of the classification model [40]. LOOCV is a special case of k-fold CV, in which the number of folds is equal to the number of instances (as shown in Figure 3). That is, LOOCV uses all samples in the dataset as the test and training data. The benefit of using so many fitted and evaluated models is a more robust estimate of model performance, as each row of data is given an opportunity to represent the entirety of the test dataset [23]. Therefore, when the number of instances in either a dataset or a class value is small, LOOCV must be applied to verify the accuracy of the classification algorithm [41]. These characteristics of LOOCV were considered in this study when selecting it as the CV method for the performance evaluation of the predictive models. RF is considered among the 10 best classifiers [39]. RF, a representative DT-based algorithm, is a bagging-based ensemble technique that generates bootstrap sampling ( Figure 2). RF builds numerous subsets (bootstrap sampling) from the training data and trains the same algorithm several times. The resulting prediction is determined as the mean of all predictions of the submodels. As the number of trees increases, RF can avoid overfitting and be less affected by outliers. Moreover, even when the class is imbalanced, it has superior predictive power over other ML algorithms [33]. Considering these strengths, the application of RF in the data environment utilized in this study (small datasets composed mainly of categorical variables) is also expected to be useful for predicting DW generation.

Leave-One-Out Cross-Validation (LOOCV)
K-fold and leave-one-out cross-validation (LOOCV) are widely used to evaluate the performance of classification algorithms. For large amounts of data, k-fold cross-validation (CV) should be applied to assess the accuracy of the classification model [40]. LOOCV is a special case of k-fold CV, in which the number of folds is equal to the number of instances (as shown in Figure 3). That is, LOOCV uses all samples in the dataset as the test and training data. The benefit of using so many fitted and evaluated models is a more robust estimate of model performance, as each row of data is given an opportunity to represent the entirety of the test dataset [23]. Therefore, when the number of instances in either a dataset or a class value is small, LOOCV must be applied to verify the accuracy of the classification algorithm [41]. These characteristics of LOOCV were considered in this study when selecting it as the CV method for the performance evaluation of the predictive models.

Data Source of Demolition Waste Generation (DWG) Data
This study utilized existing raw data [23,26,42] to develop predictive models for demolition waste (DW) generation. Raw data pertaining to the urban regeneration project districts in Daegu (35.88° N latitude, 128.61° E longitude) and Busan (35.87° N latitude, 128.63° E longitude) in the southern part of Korea were obtained. Therefore, the target buildings surveyed mainly included low-rise detached houses. Table 1 lists the statuses of the buildings whose raw data were used in this study. Raw data were collected through investigations of the main members, areas, number of floors, structures, material types (i.e., mortar, concrete, block, brick, timber, slate, roofing tile, ceramic and glass, metal, or soil), characteristics, and sizes of the buildings prior to dismantling. These investigations were conducted using teams of two individuals, where one focused on measurements and the other focused on recording the data. The raw data included information on the amount of generation (kg/m 2 ) of 10 types of DW (mortar, concrete, block, brick, timber, slate, roofing tile, ceramic and glass, metal, and soil) from 784 buildings. For each building, information is included on six types of construction characteristics (gross floor area, region, building structure, building use, wall material, and roofing material), which correspond to input variables that affect DW generation. The unit of DW generation data used in this study is kg/m 2 , in accordance with the following equation: where DWGR is the demolition waste generation rate (kg/m 2 ), is the amount of waste material j with properties i (kg), and GFA is the gross floor area (m 2 ) of building k.

Data Preprocessing and Preparing Datasets for Prediction Models
Preprocessing data are necessary to reduce the impact of data distortion or outliers and include cutting, adding, and transforming training datasets [43,44]. A stable dataset construction is required to improve the performance of the predictive model. Therefore,

Data Source of Demolition Waste Generation (DWG) Data
This study utilized existing raw data [23,26,42] to develop predictive models for demolition waste (DW) generation. Raw data pertaining to the urban regeneration project districts in Daegu (35.88 • N latitude, 128.61 • E longitude) and Busan (35.87 • N latitude, 128.63 • E longitude) in the southern part of Korea were obtained. Therefore, the target buildings surveyed mainly included low-rise detached houses. Table 1 lists the statuses of the buildings whose raw data were used in this study. Raw data were collected through investigations of the main members, areas, number of floors, structures, material types (i.e., mortar, concrete, block, brick, timber, slate, roofing tile, ceramic and glass, metal, or soil), characteristics, and sizes of the buildings prior to dismantling. These investigations were conducted using teams of two individuals, where one focused on measurements and the other focused on recording the data. The raw data included information on the amount of generation (kg/m 2 ) of 10 types of DW (mortar, concrete, block, brick, timber, slate, roofing tile, ceramic and glass, metal, and soil) from 784 buildings. For each building, information is included on six types of construction characteristics (gross floor area, region, building structure, building use, wall material, and roofing material), which correspond to input variables that affect DW generation. The unit of DW generation data used in this study is kg/m 2 , in accordance with the following equation: where DWGR is the demolition waste generation rate (kg/m 2 ), A ij is the amount of waste material j with properties i (kg), and GFA is the gross floor area (m 2 ) of building k.

Data Preprocessing and Preparing Datasets for Prediction Models
Preprocessing data are necessary to reduce the impact of data distortion or outliers and include cutting, adding, and transforming training datasets [43,44]. A stable dataset construction is required to improve the performance of the predictive model. Therefore, this study performed data outlier elimination and standardization of data to reduce the impact of data distortion and outliers resulting from the large variation in the collected raw data. In this study, the interquartile range method was used to eliminate outliers in the screening of data for use in the models, in accordance with Equation (2). After outlier removal, 690 out of the 784 building samples were used to develop the RF and GBM predictive models in this study.
where IQR is the interquartile range and the value of IQR is Q3 minus Q1, Q1 being the 25th percentile, and Q3 the 75th percentile (Q represents quartile). Standardization was performed to reduce the impact of data outliers on model performance and to build datasets of the same scales and units. Data standardization was conducted using Equation (3), in which the average of the data was x average and the standard deviation was σ standard deviation . Table 2 presents explanations of the data used in this study. Six independent variables and one dependent variable (DW generation) were used to develop the RF and GBM predictive models. The independent variables consist of nominal variables (region, use, structure, wall material, and roofing material) and one numeric variable (GFA). As shown in Table 2, the values converted into scales were utilized as input variables for the nominal variables.

Application of Machine Learning Techniques
The RF and GBM algorithms were selected to develop predictive models for DW generation for datasets with small samples and were composed mainly of categorical variables. In this study, RF and GBM algorithms were applied to develop 10 models for 10 types of DW, and one model for total DW generation including all 10 types of DW. The RandomForestClassifier [45] and GradientBoostingClassifier [46] packages were used to develop predictive models for DW generation. For optimal performance and evaluation upon application of the algorithm, parameters were tuned for the RF and GBM algorithms, including n_estimators (the number of trees to be grown). The parameters were tuned to maximize each model's prediction accuracy on the dataset. The optimal settings were determined using the LOOCV process. Explanations of the hyperparameters applied to develop the RF and GBM models in this study are presented in Table 3.

Model Validation
The LOOCV validation technique was used to validate the models. Because LOOCV puts all samples through tests, it has the advantage of achieving stable results when targeting small datasets compared to the validation set approach, which is an existing cross-validation method (10-fold or k-fold) [23,47]. Several techniques that can be used for model performance evaluation are available. In this study, four statistical metrics (MAE, RMSE, R 2 , and R) were utilized to verify the performance of the models. Definitions of the performance evaluation metrics are presented in Equations (4)- (7).
where x i is the observed value of the generated DW amount, y i is the predicted value of the generated DW amount, x i is the mean observed value of the generated DW amounts, y i is the mean predicted value of the generated DW amount, and n is the number of samples. Figure 4 presents the comparison results of the performance metrics in the 11 predictive models for DW. According to the results of MAE and RMSE values in Figure 4, which indicate the stability of predictive performance, the MAE values of the RF models in all 11 predictive models are lower than those of the GBM models. MAE values are slightly lower in the 10 predictive models for each DW, and also in the total DW production predictive model: the MAE value of the RF model (MAE: 193.7) is lower than that of the GBM model (MAE: 206.2). According to the RMSE results, the GBM models (RMSE value of slate is 0.20, and RMSE value of roofing tile is 34.02) for the slate and roofing tile DW predictive models indicate slightly more stable predictive performance than achieved by the RF model (RMSE value of slate is 2.04, and RMSE value of roofing tile is 34.31). However, the RMSE values of the remaining DW predictive models are lower in the RF model than in the GBM model. The MAE and RMSE results demonstrate that, in the predictive models for small datasets of categorical variables, RF algorithms generally have slightly more reliable predictive performance than achieved by GBM algorithms. In other words, the bagging technique appears to have a slightly better predictive performance in terms of stability over the boosting technique.

Model Performance
tive models for DW. According to the results of MAE and RMSE values in Figure 4, which indicate the stability of predictive performance, the MAE values of the RF models in all 11 predictive models are lower than those of the GBM models. MAE values are slightly lower in the 10 predictive models for each DW, and also in the total DW production predictive model: the MAE value of the RF model (MAE: 193.7) is lower than that of the GBM model (MAE: 206.2). According to the RMSE results, the GBM models (RMSE value of slate is 0.20, and RMSE value of roofing tile is 34.02) for the slate and roofing tile DW predictive models indicate slightly more stable predictive performance than achieved by the RF model (RMSE value of slate is 2.04, and RMSE value of roofing tile is 34.31). However, the RMSE values of the remaining DW predictive models are lower in the RF model than in the GBM model. The MAE and RMSE results demonstrate that, in the predictive models for small datasets of categorical variables, RF algorithms generally have slightly more reliable predictive performance than achieved by GBM algorithms. In other words, the bagging technique appears to have a slightly better predictive performance in terms of stability over the boosting technique.     Figure 5, it can be seen that the scatterplots of predicted and observed values of the total DW predictive model conform more closely to ideal prediction lines in the RF model than in the GBM model. However, GBM models demonstrate slightly better predictive performance in terms of accuracy: the R 2 and R values for slate are 0.48 and 0.78, respectively, in the GBM model, and 0.46 and 0.77 in the RF model; R 2 and R values for roofing tiles are 0.37 and 0.75, respectively, in the GBM model, and 0.37 and 0.74 in the RF model. Considering the models overall, the RF models are slightly superior in terms of the accuracy of DW predictive models, and it is judged that the bagging technique will be more useful than the boosting technique for developing accurate models for small datasets composed of categorical variables. tion lines in the RF model than in the GBM model. However, GBM models demonstrate slightly better predictive performance in terms of accuracy: the R 2 and R values for slate are 0.48 and 0.78, respectively, in the GBM model, and 0.46 and 0.77 in the RF model; R 2 and R values for roofing tiles are 0.37 and 0.75, respectively, in the GBM model, and 0.37 and 0.74 in the RF model. Considering the models overall, the RF models are slightly superior in terms of the accuracy of DW predictive models, and it is judged that the bagging technique will be more useful than the boosting technique for developing accurate models for small datasets composed of categorical variables. Meanwhile, the predictive performance based on the type of DW showed different results. This difference can be attributed to the characteristics of the features applied to the model development. In other words, it is expected that more stable performance results can be obtained if the variables affecting waste generation with respect to waste type are included, apart from the six variables used in this study. For instance, the features used in this study (i.e., region, use, structure, GFA, WM, and RM) are considered to be appropriate key features as input variables affecting the amount of soil generation. On the other hand, the ceramic and glass predictive model with the lowest prediction performance requires additional input variables (such as the external window area ratio), apart from the six variables used in this study. Similar findings were also reported for MSW by Johnson et al. [34], who applied the GBM algorithm to apply feature compositions differently and developed predictive models for refuse, paper, and MGP (metal, glass, and plastic). Each predictive model developed in that study showed different R 2 and RMSE results depending on the external or internal characteristics of the applied features.

Comparison of Predictive Models
Although all RF and GBM models developed in this study were derived from a small dataset, the correlation between the actual and predicted values (R value) is 0.7 or higher, demonstrating excellent predictive performance ( Figure 4). However, while some DWs (mortar, roofing tile, ceramic, and glass) have slightly lower R 2 values (0.4 or lower), their R values are all 0.7 or higher, and thus no issues are expected for the accuracy of the predictive performance. However, these results suggest that it is necessary to include key features considering the characteristics of buildings that affect the DWG for DW types with low predictive performance. In addition, it is expected that this will help improve the R 2 value, because this value gradually increases as new variables are introduced into the model [48,49]. Meanwhile, the predictive performance based on the type of DW showed different results. This difference can be attributed to the characteristics of the features applied to the model development. In other words, it is expected that more stable performance results can be obtained if the variables affecting waste generation with respect to waste type are included, apart from the six variables used in this study. For instance, the features used in this study (i.e., region, use, structure, GFA, WM, and RM) are considered to be appropriate key features as input variables affecting the amount of soil generation. On the other hand, the ceramic and glass predictive model with the lowest prediction performance requires additional input variables (such as the external window area ratio), apart from the six variables used in this study. Similar findings were also reported for MSW by Johnson et al. [34], who applied the GBM algorithm to apply feature compositions differently and developed predictive models for refuse, paper, and MGP (metal, glass, and plastic). Each predictive model developed in that study showed different R 2 and RMSE results depending on the external or internal characteristics of the applied features.

Comparison of Predictive Models
Although all RF and GBM models developed in this study were derived from a small dataset, the correlation between the actual and predicted values (R value) is 0.7 or higher, demonstrating excellent predictive performance ( Figure 4). However, while some DWs (mortar, roofing tile, ceramic, and glass) have slightly lower R 2 values (0.4 or lower), their R values are all 0.7 or higher, and thus no issues are expected for the accuracy of the predictive performance. However, these results suggest that it is necessary to include key features considering the characteristics of buildings that affect the DWG for DW types with low predictive performance. In addition, it is expected that this will help improve the R 2 value, because this value gradually increases as new variables are introduced into the model [48,49]. Figures 6-8 present the comparison results of the observed and predicted DW models using the RF and GMB algorithms; the results for all 11 RF and GBM models indicate that the actual DW generation patterns are well simulated. The mean observed value of the soil predictive model, which had the best predictive performance, is 21.8 kg/m 2 , while the mean predicted value of the RF model (R value: 0.947) is 21.6 kg/m 2 , and that of the GBM model (R value: 0.824) is 21.4 kg/m 2 . The mean observed value of the ceramic and glass predictive model, which had the lowest predictive performance among DW predictive models, is 5.3 kg/ m 2 , where the mean predicted value of the RF model (R value: 0.729) is 5.4 kg/m 2 , and that of the GBM model (R value: 0.711) is 5.5 kg/m 2 . The mean observed value of the total waste predictive model is 1171.2 kg/m 2 , while the mean predicted value of the RF model (R value: 0.786) is 1169.9 kg/m 2 , and that of the

Discussion, Limitations, and Future Work
AI models in previous studies [19,35,50] presented results for some DW types. Johnson et al. [35] applied GBM algorithms to conduct research on refuse, paper, and MGP (metal, glass, and plastic) predictive models. In this study, R 2 results for refuse, paper, and MGP spatial models, to which features of external groups were applied, demonstrated predictive performances of 0.604, 0.628, and 0.428, respectively. The predictive performance of MSW and paper predictive models, to which decision tree (DT) algorithms were applied in the study by Kannangara et al. [50], was 0.54 and 0.31, respectively. In the study by Kumar et al. [19], the R 2 of the plastic generation predictive model, to which the RF algorithm was applied, demonstrated a predictive performance of 0.66. This study proposes predictive models for the generation of 10 types of DW, and one for total waste generation, for small datasets. In this study, the 11 RF and GBM models developed, despite targeting small datasets, demonstrated excellent predictive performance with the correlation (R) of the observed and predicted values at 0.73-0.95 (RF models) and 0.71-0.92 (GBM models), respectively. R 2 values also demonstrated excellent predictive performances of 0.6 or higher, except for mortar, slate, roofing tile, ceramics, and glass. However, as AI models depend on datasets of large sizes [15], a dataset size that is not sufficiently large is a fundamental limitation of this study; obtaining datasets that are sufficiently large is a significant challenge. This study also found significant differences in the performance (MAE, RMSE, R 2 , R) of predictive models depending on the type of DW, and the R 2 values for demolition waste materials such as slate, mortar, roofing tile, ceramic, and glass were relatively low (≤0.5), demonstrating relatively low performance. Johnson et al. [35] suggested that the development of optimal feature groups is also important for improving the predictive performance of AI models by demonstrating that differences in the composition and characteristics of features applied to AI models affect the results of predictive performance. Based on this, it is deemed that further features must be developed to improve the performance of predictive models for the types of DW that had relatively low predictive performance in the present study.

Conclusions
This study investigated the development of AI models for DW generation suitable for small datasets composed mainly of categorical variables. The DT-based ensemble model was applied as an algorithm suitable for the datasets, with one selected algorithm being the RF algorithm, which is representative of the bagging technique, and the other being the GBM algorithm representative of the boosting technique. Data preprocessing was performed to improve the stability and accuracy of the models, and the parameters were tuned to fit the RF and GBM algorithms. The LOOCV technique was applied to verify the developed RF and GBM models, and performance evaluation was conducted using statistical metrics such as MAE, RMSE, R 2 , and R. Therefore, the consideration of AI models developed in this study for small datasets composed mainly of categorical variables, along with the RF and GBM predictive models developed for the generation of 10 types of DW and one for total DW generation, is summarized as follows.
First, for small datasets composed mainly of categorical variables, the bagging technique (RF model) was found to be superior to the boosting technique (GBM model) in terms of the stability and accuracy of predictive performance. However, the GBM model demonstrated excellent predictive performance for certain types of DW (slate and roofing tile). Therefore, the selection of appropriate RF and GBM algorithms, depending on the type of DW, is necessary in developing DW prediction models for small datasets composed mainly of categorical variables.
Second, 11 RF (R 2 values: 0.22-0.84; R values: 0.71-0.92) and GBM (R 2 values: 0.34-0.89; R values: 0.73-0.95) predictive models under identical conditions demonstrated differences in performance according to the type of DW. This is considered to be due to the characteristics of the features applied in developing the models. Therefore, it is expected that the performance of predictive models will be improved in the future if features matching the characteristics of DW with low predictive performance (mortar, slate, roofing tile, ceramic, and glass, having R 2 values ≤0.5) are added.
Finally, it is recommended that RF methods be applied to develop DW predictive models for concrete, block, brick, timber, ceramic and glass, metals, soil, and total waste, while applying GBM models for slate and roofing tiles, based on 11 predictive models developed in this study. This study proposes predictive models for more types of DW than included in the AI models of previous studies. The results of this study are anticipated to be useful tools for building demolition businesses to establish thorough and detailed DW management strategies. For example, based on the predicted amount of waste to be generated from the type of DW, a demolition company can place orders for the required demolition equipment and transportation, which will be of use in minimizing excess costs and personnel allocations.