Estimating Time Spent at the Waste Collection Point by A Garbage Truck with A Multiple Regression Model

The planning of the garbage trucks’ routes is an essential process in waste collection companies. The main issues in garbage truck routing are determining the optimal routes, minimizing time, decreasing the costs, and reducing the pollution’s emission. In the literature, the time spent at a waste collection point (WCP) is considered as the average time, or it is not included at all. Time spent at a WCP is determined by the processes of picking up, emptying, and putting down the waste containers and the factors specific for different WCPs. Those factors impact the time spent at WCP significantly. Excluding time spent at a WCP or taking the average of that in the planning approach may lead to the inaccurate estimation of total collection time. The aim of this article is to present the multiple regression model for estimating time spent at a WCP. We analyzed the impact of the WCP factors (i.e., building type and number of containers) on the time that a garbage truck spends at it. We initially considered seven chosen factors, five categorical and two numerical. Based on this, we developed the multiple regression model based on linear regression use. Later, the proposed model was validated based on data obtained from the municipal company operating in Wroclaw city, Poland. The study confirmed that the defined factors significantly affect garbage truck’s time spent at a WCP and should be taken into account during waste collection planning processes’ performance.


Introduction
The waste collection process is in direct or indirect relation with sustainability aims. In Reference [1], the challenges with achieving sustainable development goals were presented. It was stated that 10 out of 17 sustainable development goals are dependent on the waste collection process. This process impacts areas related to environmental protection, public health protection, poverty reduction, and resource value. It proves the need for proper asset management in the waste collection process, which generates up to 70% of total waste management costs [2].
The municipal waste collection problems are currently a challenge in terms of economic development, population growth, and consumerism. The approaches presented in the literature focus on route planning, collection time, location of containers, costs, emission of pollutants, or energy consumption, as summarized in Reference [1].
The garbage truck routing comes to a widely analyzed vehicle-routing problem. However, in the context of waste collection, this case is more challenging to be solved. It is mainly due to a more significant number of limitations, such as the level of filling of the containers, which has to be predicted, or due to the need to take into account time windows for garbage collection (there are places where waste can be collected only during specified time windows). It leads to a higher level of complexity of such models [3].
Time is one of the most common parameters presented in the literature for waste collection optimization. In some papers, a collection time is divided into the driving time and the time spent at a waste collection point (WCP) only [4][5][6]. Time spent at a WCP can also be included more detailed by considering the time of picking up/putting down the container [7,8] and the time of inactivity (employees rest) [9], as well as the time for lunch break [2]. As a part of driving time, some authors consider the turns on the road [10,11], while others consider the time of stopping at lights [12,13].
It has been noticed that, regardless of considered parameters, there is a common feature introduced in the literature models and waste-collection-planning methods. Time spent at a WCP is generally treated as a set of processes or analyzed in detail as the time of picking up, emptying, and putting down of the containers. However, this time is usually presented as the average value, as shown in Table 1, containing selected publications dealing with waste collection modeling.  [18] X X X X X not included 1 e.g., number of containers, amount of waste, type of waste, and area type. WCP, waste collection point.
The authors of Reference [10] presented a waste-collection-route time estimation for the hauled container system. They included parameters such as the average drop-off time, average time at disposal, haul time to/from the disposal site, time spent at intersections and turns, and off-route time. Additionally, they took into account the distances covered. The time spent at a WCP was given as the average pickup time per trip. Other authors [14] proposed a model for predicting energy consumption, time requirements, and the number of garbage trucks for curbside waste collection. They specified parameters of time (average time per lift, the average time to unload, driving time, stop-go time, and time per stop based on time per lift) related to energy and distance traveled. They also considered area type, frequency of bin collection, quantity of waste, and dwelling profile. Another work [15] presented a model for predicting time and fuel consumption during waste collection. They estimated the time of collection based on the distances between the WCPs, the truck's speed, and the time per stop. The time spent at WCP presented as time per stop was given as weighted average time per stop. In the next paper [17], which focused on mapping, the collection process's time, average time per container, and stop time per collection point were proposed as parameters.
Regarding the estimation of the waste collection process's costs, some authors rely on average serving time, average travel times, and time at turns [11]. The others [16] include only average service time at containers as the time parameter.
All the works mentioned above take the time spent at WCP as the average. However, the authors of Reference [13] proposed a different approach. They estimated the collection time based on population density per 100 m road distance. They considered average waiting time per stop sign and per traffic light and also pickup time. They based the pickup time on the number of containers and the number of WCPs. Additionally, in Reference [19], there can also be found some time spent at WCP estimation attempts. The authors showed two models for time per stop estimation. These models for two-and one-person crew include a number of containers at a stop, the total number of throw-away items serviced at a stop, and a number of services collected at each stop.
The literature has already emphasized that determining the time spent at WCP by garbage trucks is problematic. For this reason, the authors of Reference [9] proposed a methodology for measuring all the time types related to the waste collection process. In terms of time spent at a WCP, they emphasized that this time depends on many features, such as the container's size, the type of vehicle, or the occurrence of overfilling of the containers. They experimentally determined the time spent at WCP for different collection systems (characterized by different types of waste, trucks, and bins, as well as bin volume and number of workers per team) but finally presented it as an average.
To sum up, the development of effective waste-collection-planning processes caused the need to include time spent at WCP by a garbage truck into waste collection models and methods. However, the conducted literature review allowed us to identify a research gap in the field of estimating time spent at WCP by a garbage truck. Currently, waste collection models include time spent at WCP as the average value, which is not a good enough representation for waste collection planning, due to too much imprecision in the predictions. Our work answers the observed research gap focusing only on one of the sub-processes of waste collection-the WCP service. This sub-process is not analyzed in the literature sufficiently (see, e.g., References [1,3,20]). Different specific factors of WCPs are not discussed enough, despite having a significant impact on time spent at WCP-to reduce or lengthen it. Papers on the subject of waste collection focus mainly on driving time. Time spent at WCP is mostly ignored or taken as the average. Following this, the article draws attention to the vital parameter of the waste collection process, which has not been a primary focus of research to date. Thus, the article aims to present a developed multiple regression model for estimating the time spent at WCP. The proposed model is based on a linear regression use, where we have linked time spent at WCP to the factors that characterize each WCP visited by a garbage truck.
Following this, the main contributions of this study are the following: • We have defined the main factors that influence time spend at WCP.

•
We have examined how factors may affect the time spend at WCP.

•
We have introduced a multiple regression model for predicting time spent at WCP by a garbage truck.

•
We have compared the estimation of time spent at WCP obtained from a multiple regression model with a commonly used its average value.

•
We have proposed a procedure for multiple regression modeling, which could be used in any waste collection system for garbage truck route planning performance. • Finally, we have validated the developed model with the use of internal and external validation methods.
Therefore, the article is structured as follows. Section 2 contains a short description of the analyzed waste collection system. Moreover, there is a description of data collection and a presentation of the proposed multiple regression model for predicting time spent at a WCP. The implementation of a multiple regression analysis follows four main steps: study design, data preparation, data analysis, and results reporting. Next, in Section 3, the results of the internal and external model validation are presented. A detailed discussion of the obtained results is provided in Section 4. Finally, in the Section 5 the whole work is summarized in the form of conclusions and the identification of further directions of our research work.

Waste Collection System Description
The investigated waste management system can be divided into four subsystems: waste generation system, waste collection system, waste treatment system, and waste storage and disposal system. The waste is being collected at WCPs and transported to treatment systems or directly to the storage system. The analyzed waste collection process for a single route is presented in Figure 1. A vehicle is being weighed at the start point and then is visiting the WCPs based on a specified schedule. After visiting all the schedule points, the vehicle returns to the start point, where it is weighed again and emptied.

Data Collection
The research was conducted in the city of Wroclaw (in Poland), where the collection process is carried out according to a fixed scheme presented in Figure 1.
The datasets used in this paper were derived from a combination of information from two different sources ( Figure 2). Dataset 1 consisted of the garbage truck drives' data, and Dataset 2 consisted of the factors affecting the garbage truck's stopping time at each WCP. Data on garbage truck driving and a mixed waste collection were collected in Dataset 1. The data included information on each process's start and end times: driving to the WCP, stopping at the WCP, and emptying the container. This database consists of 14 routes conducted by different vehicles, with different loaders, and for different WCPs located in different areas of the city. The vehicle routes are not doubled. A total of 661 individual pieces of information on the time spent at WCP are available thanks to this measurement.
Dataset 2 was based on field research consisting of collecting information on the various factors affecting the WCP collection process's time. These factors were identified based on a literature review and information collected from garbage truck employees. The following factors were examined: WCP cover type, building type, WCP surface type, and the number of containers. These data were collected via tablets equipped with proprietary measurement applications (the survey was conducted in June-August 2020). The survey delivered the current information about the individual characteristics of 5983 WCPs. The data were collected as part of the basic research of a project supported by the National Science Centre, Poland (grant number 2019/03/X/ST8/00287), which aimed to determine the influence of the studied factors on the WCP service time.
The creation of Dataset 3 required us to link each record from Dataset 1 with the corresponding WCP from Dataset 2. Due to the lack of a key to link these databases directly, we relied on GPS coordinates. We were able to link 258 from the 661 records with their corresponding factors. Finally, we were able to consider seven factors (five categorical and two numerical) influencing time spent at a WCP by a garbage truck. To sum up, a total of eight variables were collected as Dataset 3, seven of which are factors influencing the eighth variable-time spent at WCP. The summary of the considered variables is presented in Table 2. Time spent at WCP-This indicator represents the time required to perform all the necessary actions within the WCP. This time is counted from the moment of stopping until the vehicle leaves the WCP. Data were collected in Dataset 1.
WCP type-This is mainly divided according to the type of small architecture object, i.e., object within which the containers were placed. Three types of WCP were distinguished: freestanding containers, covered and open, covered and closed. It was verified whether the need to avoid obstacles and open the cover has a significant impact on the time spent at WCP by a garbage truck. Data were collected as part of Dataset 2.
Building type-The type of building often determines a different pickup technique for loaders. The collection process is different in single-family housing, multi-family housing, and other (e.g., stores and mixed building types). Data were collected within Dataset 2.
WCP Surface-There was considered the type of ground on which the containers are hauled as one of the factors. There are two types of surfaces: paved and unpaved. This factor seems to be much more critical when analyzed with weathering factors. However, including weathering factors would complicate the final model and make it impractical. Data were collected as part of Dataset 2.
A number of loaders-This is one of the factors considered in the literature [9,19]. Data were collected as part of Dataset 1.
Planned cleaning of WCPs-Cleaning containers may result from a random event (that type was not included in the model), but most of the work to keep WCPs clean is a planned activity. There are cleaning schedules for all the WCPs and the schedules for the specific WCPs based on the residents' demands. In this case, the WCP service is much longer, and this factor should be taken into account during route planning. Data were collected within Dataset 1.
A number of containers-Where a container was empty or no containers were reported for collection at a given WCP (mainly single-family housing), we assumed 0. For quantities 8,9,11, and 12, there was insufficient representation, so they were not included in the model. Data were collected as part of Dataset 1.
Truck distance from WCP-this dataset was used to assess the fixed distance between the vehicle's stop and the actual WCP. An example of such a situation can be gated communities. The vehicles often do not enter these communities but stop in front of the gate, and the containers are hauled from the cover to the vehicle. Data were collected as part of Dataset 1.

Multiple Regression Model
It has been verified that the use of a regression model based on only one factor is insufficient in estimating time spent at WCP by garbage truck (Table 3). Simple regression models based on each factor separately resulted in achieving the highest R 2 = 0.584 when including only the number of containers. Therefore, multiple regression was used to predict a time spent at WCP by a garbage truck.
In linear regression, with p independent variables (predictors) X 1 , X 2 , . . . , X p and a dependent variable (predicted value) Y, Equation (1) can be obtained [21]: In our case, we consider the variables listed in Table 2: the time spent at WCP by a garbage truck as a dependent variable and factors from Dataset 2 as independent variables. According to Reference [22], the main stages and procedures of multiple regression analysis, presented in Figure 3, can be developed. Based on this scheme, the following steps were performed: Due to difficulties outlined in the description of Dataset 3 development, the sample size is 258. This sample size is considered sufficient considering the sample size rule based on a number of predictors, p, proposed by Reference [22], where N > 50 + 8 * p. Additionally, it should be noticed that the used data come from different regions of the city and form different routes.
For predicting time spent at WCP by garbage truck (dependent variable), seven factors connected with WCP (independent variables) were initially chosen: WCP cover type, building type, WCP surface, number of loaders, planned cleaning, number of containers, and truck distance from WCP.

•
Stage II. Data preparation As the first step of data preparation, data were divided into two subsets: -Subset 1: Two hundred measurements of the collected data for internal validation and model building; -Subset 2: Fifty-eight measurements of the collected data for external validation, data from two independent routes, and also from two city regions not included in Dataset 1.
Among chosen seven independent variables, five of them are categorical type. To be able to use them in the model, dummy coding [23,24] was necessary. WCP surface and planned cleaning are categorical variables with only two categories. From a variable with two categories, one variable will be created with the value 0 (absence of chosen category) or 1 (presence of chosen category). In the case of WCP type, building type, and truck distance from WCP, there are three categories in each of them. From one categorical variable with three categories, there will be created two variables with the value 0 or 1. One category must be omitted to eliminate collinearity. It should be noted at this stage that dummy coding resulted in a new, larger number of independent variables. Five categorical variables were transformed into eight independent variables (Table 4). Consequently, the initial number of seven independent variables was expanded to ten independent variables. In the data preparation stage, there is also a need to check basic assumptions of multiple regression, which, among others, are normality, linearity, and multicollinearity.
According to the assumption of normality, residuals (the difference between observed and predicted values) should be normally distributed [25]. To check this assumption, chiˆ2 test was performed. It was previously stated that there are no grounds for rejecting the hypothesis about the normal distribution of residuals. Another assumption of linearity, a linear relationship between the independent and dependent variables [26], is fulfilled (nonlinearity test, Lagrange multiplier = 0.735, p-value = 0.391, α = 0.05, and p-value > 0.05). Multicollinearity occurs when one of the independent variables is in a linear relationship (is strongly correlated) with one of the others [21]. Multicollinearity can be detected, for example, by examining the correlation matrix (Table 5) or with the use of variance inflation factor (VIF), with its minimum value equaling 1 and value above 10 indicating multicollinearity [27]. The coefficient of correlation values (from −0.43 to 0.23) listed in Table 5 do not indicate any significant correlation between the independent variables. This is also confirmed by the fact that every independent variable has a value of VIF slightly above 1 (from 1.125 to 1.651). Both the correlation matrix and the VIF prove that there is no multicollinearity among independent variables.

• Stage III. Data analysis
From three main types of multiple regression (standard, sequential, and stepwise) described in Reference [28], stepwise multiple regression was used, which can only be implemented for prediction purposes [28]. In stepwise regression, there are three techniques for independent variables choosing: forward selection (adding variables one by one based on statistical criterion), backward elimination (removing variables one by one based on statistical criterion), and stepwise procedure (a combination of forward and backward). According to chosen by us backward selection, model building starts with all the independent variables, which are eliminated one by one based on a chosen criterion (for example, p-value or Mallow's Cp [29]). The elimination process (based on the p-value in test F greater than 0.05) is presented in Table 6. After eliminating all independent variables with a p-value greater than 0.05, the final model (Model 5) is developed. From ten independent variables inputted at the beginning, four of them were removed due to insignificance (p-value greater than 0.05). For every model R 2 and adjusted R 2 adj , we calculated according to (2) and (3): where x-actual values, y-predicted values, p-number of predictors, and n-number of observations. R 2 shows the percent of the variance in the dependent variable predicted by independent variables [28]. The more independent variables, the greater R 2 . Therefore, R 2 adj should be used, which includes the number of predictors.
It can be noticed that the highest R 2 adj representing predictive power was found to be 0.806 for the final model (Model 5). This model also has the lowest standard error of the estimate. Table 7 shows the analysis results of the obtained coefficients of the final regression model. Based on the presented coefficients, it can be stated that time spent at WCP increases as a number of containers increases. The presence of single-family or multi-family building type, truck distance from WCP 0-15 m, no planned cleaning, and a number of loaders increase causes time spent at WCP decrease. Moreover, the single-family building has more than two times greater influence on time spent at WCP decrease than multi-family building.
In accordance with the coefficients presented in Table 7, the regression model equation can be formulated as Formula (4): Internal and external validation of the developed model is described in the next section (Section 3).

• Stage IV. Results reporting (Section 3)
Results are presented in Section 3, where metrics for internal and external validation results are shown. Besides R 2 , Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) were calculated with the use of Formulas (5) and (6): where x-actual values, y-predicted values, p-number of predictors, and n-number of observations.

Results
When developing a prediction model, it is necessary to consider internal and external validation. Internal validation can estimate potential overfitting [30] and internally investigate the replicability [31]. The aim of the internal validation is model testing. It shows the approximation of model performance, which will be built from the whole dataset. Among internal validation techniques, data-splitting, repeated data-splitting, jack-knife, and bootstrapping can be found [32]. We used repeated data-splitting. According to this approach, we randomly selected a portion of Subset 1 (80%) for model development and tested it on the remaining 20%. This procedure was repeated 100 times, to get different samples at each repetition, to examine different scenarios. The average results of the internal validation are presented in Table 8. After one hundred repetitions of data-splitting, the average R 2 0.805 (R 2 adj 0.798) of developed models was obtained. In the testing process, the average R 2 decreased to 0.764. MAE and RMSE are slightly higher. MAE (absolute difference between the actual data and the predicted data) equals 40.756 for the models' development and 42.657 for the testing process can be considered small, taking into account the data range. RMSE indicates that the model will miss the actual values by about 43 s. Based on the obtained results, it can be stated that the model is internally validated and has a good fit. No potential overfitting has been observed.
The developed prediction model should also be externally validated before being accepted into practice [30]. For external validation purposes, a new sample of data from the same/similar population should be obtained [32]. Subset 2 (58 records) shown in the data preparation stage contains measurements from two routes and two city regions not included in Subset 1 (200 records), for internal validation. For this reason, it can be treated as a new sample from the same population. Results of the prediction model for Subset 2 are showed in Table 9. Developed prediction model with R 2 0.812 (R 2 adj 0.806) applied to new sample performs slightly worse based on obtained lower R 2 , higher MAE and RMSE, which is expected in external validation. These results are acceptable and correspond with the results of the internal validation. They give a far more accurate representation of time spent at WCP by garbage truck than using the average value. This confirms the calculated MAE and RMSE for the commonly used average-based estimation. MAE for the average-based estimation is almost three times higher than the one obtained in external validation of our model. Looking at Figure 4 presenting measured and predicted times spent at WCP by a garbage truck, it can also be seen that the developed model gives a better estimation than using the average value.  Time spent at WCP can take from 21% to 56% of the total waste collection time (results from Dataset 1 examination containing data from different garbage trucks and different routes). It proves that this parameter is essential in waste collection planning and should be considered with as much detail as the driving time between WCP, where, among others distances, truck speed, number of turns, stop signs, and intersections are included by different authors.

Discussion
The developed multiple regression model given by Formula (4) and presented in Section 2.3 was based on six independent variables. Initially, seven factors influencing time spent at WCP were considered. After dummy coding that was necessary due to the existence of categorical variables, ten independent variables were included to build the model. Based on the backward elimination approach, we eliminated four of them during the model preparation, leaving only those that were significant due to the chosen criterion (p-value). With the proposed model and the inclusion of the described variables, it is possible to achieve a better prediction of the time spent at WCP by a garbage truck than is currently done. External validation results showed that our model can predict with the error of around 47 s. However, it should be noted that it is almost three times less than with the commonly used average-based estimation.
Time spent at a WCP by a garbage truck should not be completely ignored or considered as an average value. From the sustainability point of view, it is essential to make the best possible use of the owned vehicle fleet. Significant under-or overestimation of time spent at WCP may lead to inefficient waste collection planning. WCPs can be incorrectly grouped into collection regions and improperly assigned to the garbage trucks. Following this, planned routes may turn out to be longer or shorter than expected. If the garbage truck comes back and still has resources (time, capacity, etc.) to continue the collection process, the schedule includes too many trips. More trips result in higher fuel consumption, which leads to higher emissions.
In the case of waste collection planning with time windows (some WCPs can be visited only during specified time windows), accurate prediction of time spent at a WCP by garbage truck is also crucial. If a garbage truck visits a WCP too early, it must wait, which results in ineffective use of the vehicle and workers' time. If a garbage truck visits a WCP too late, the company will have to pay the financial penalties.
All of the abovementioned consequences of inaccurate time spent at WCP prediction can be reduced with the use of the proposed model. Better reality representation through more accurate prediction of time spent at a WCP by a garbage truck can result in more sustainable garbage-truck-fleet management.
The model can be directly applied to systems similar to the studied one, in which we see the following: • Diversified building type occurs, • Loaders work in 1 or 2 person teams, • Only containers are collected (we did not consider bags); • Waste is collected with back-loaded garbage trucks; • Mixed waste is collected.
If these assumptions are not met, it is necessary to perform model building as described in Section 2.
The method presented by us is characterized by more accurate predictive power than the commonly used approach based on average estimation. During predictions based on average, all variables are included without considering specific factors of WCP. For this reason, using average value can give unsatisfactory results of prediction in the case of routes with the domination of only one factor (i.e., in the case of routes with only single-family houses, there can be an overestimation of time spent at WCPs).
Our model can be valuable for waste collection planners in waste management companies. These companies currently collect data about garbage truck localization and conduct periodic WCPs inventories for their own purposes. With the proposed model's use, waste collection can be planned more accurately, and vehicle fleet can be utilized better. Additionally, we predict that the model will find application in vehicle routing problems with time windows, where the predicted latest arrival time (affected by time spent at WCP) is of great importance. Therefore, it can be an area of interest for a wide range of experts dealing with this issue.
The most significant difficulty during model preparation was connecting information from Dataset 1 and Dataset 2. In our case, these data were collected independently. GPS localization was the only possibility for connecting. We recommend that data from garbage trucks during waste collection (Dataset 1) and data from WCPs inventory (Dataset 2) should have a common key for an easier data connection. Finally, our model has some limitations. Data were collected during one season in the summer. Factors specific for different systems were not included. However, the procedure described in Section 2.3 is easy to follow and can be used in any collection system, including its specific features.

Conclusions
In the most general approach, the total time of waste collection is distinguished by the driving time (from the start point to WCP, between WCPs, and from WCP to the endpoint) and the time spent at WCP (taking the container, emptying the container, and putting the container away). The work on waste collection modeling focuses on the best possible representation of the driving time. Solutions can be found that take into account intersections, turns, traffic lights, and possible jams. However, the time spent at WCP is insufficiently analyzed. This time is often ignored or taken as the average time for WCP. Due to this, the research gap in the field of estimating time spent at WCP can be observed.
As the answer to this research gap, in this article, we have presented a multiple regression model for estimating the time spent at WCP for mixed waste. In the developed model, type of building, number of loaders, number of containers, truck distance from WCP, and planned cleaning were included as independent variables. We have obtained the adjusted coefficient of determination, R 2 , at the level of 0.806. Comparing predicted and measured time spent at WCP with commonly used average value showed that our approach gives far more accurate reality representation.
The presented model can be easily applied in waste management companies as part of garbage truck route planning. It was validated in the city of Wroclaw (Poland) during the summer season. For this reason, it could have some limitations, but the presented detailed procedure for multiple regression modeling ensures its reconstruction in any waste collection system. By following the steps presented in Section 2.3 and using gathered data, waste collection planners can estimate time spent at WCP more accurately.
In the future, we plan to develop a method concerning the vehicle routing problem with time windows, in which the presented model for the time spent at WCP by garbage truck estimation will be included.