Dwell Time Estimation of Import Containers as an Ordinal Regression Problem

Featured Application: Knowing the departure date of each container is paramount to scheduling an optimal stacking in container terminals, and thus, reducing the fuel consumption of the yard cranes. Supervised classiﬁcation algorithms are typical for estimating such a dwell time. However, we show that an ordinal regression algorithm outperforms the supervised classiﬁcation algorithms regarding the mean absolute error and the reshufﬂes generated. This research has been applied in an inbound yard from Cuba as part of a project for the optimization of the import container ﬂow. Our results can state a baseline for further dwell time estimation systems. Abstract: The optimal stacking of import containers in a terminal reduces the reshufﬂes during the unloading operations. Knowing the departure date of each container is critical for optimal stacking. However, such a date is rarely known because it depends on various attributes. Therefore, some authors have proposed estimation algorithms using supervised classiﬁcation. Although supervised classiﬁers can estimate this dwell time, the variable “dwell time” takes ordered values for this problem, suggesting using ordinal regression algorithms. Thus, we have compared an ordinal regression algorithm (selected from 15) against two supervised classiﬁers (selected from 30). We have set up two datasets with data collected in a container terminal. We have extracted and evaluated 35 attributes related to the dwell time. Additionally, we have run 21 experiments to evaluate both approaches regarding the mean absolute error modiﬁed and the reshufﬂes. As a result, we have found that the ordinal regression algorithm outperforms the supervised classiﬁers, reaching the lowest mean absolute error modiﬁed in 15 (71%) and the lowest reshufﬂes in 14 (67%) experiments.


Introduction
Optimization problems arise during the operations of stacking containers in the yard of a terminal [1]. One of those is the container Storage Space Allocation Problem (SSAP), a particular case of the storage location assignment problem [2][3][4][5]. SSAP consists of finding the best allocation for each container in a yard minimizing a criterion such as the number of container reshuffles or the crane traveling distance [6]. Operators store arrival containers in multi-level stacks to save storage space using yard cranes. Only containers at the top are accessible by yard cranes. Reaching intermediate containers of the stacks provokes reshuffles. Then, a reshuffle is an unproductive movement of the crane due to inadequate allocation of some containers [5,6].
To reduce the reshuffles, operators need to know a priori the dwell time of each arriving container. The dwell time of a container measures the days (hours or weeks) that it shall stay in the yard [7]. We shall refer to dwell time as the dwell time of import containers in a yard of a container terminal. Predominantly, the actual dwell time is unknown because the departure date to its client depends on several attributes [8]. Thus, operators employ their expertise or some algorithms to estimate the dwell time. By reducing the reshuffles, container terminals lessen their operation time and fuel consumption.
Two approaches are commonly used for estimating the dwell time. The first is an empirical approach based on the operators' knowledge [9][10][11][12][13]. The second is an approach based on statistical and machine-learning algorithms [7,8,[14][15][16][17][18]. The proposals of machinelearning algorithms to estimate the dwell time have employed supervised classification algorithms [14,16,17] and regression algorithms [7,18]. Nevertheless, the estimation of the dwell time is still considered an open problem because none of these approaches has satisfied the demands of containers yards [6,7,17,19,20].
Recent studies have solved similar problems with ordinal regressors based on deep neural networks and other machine-learning algorithms, e.g., image ordinal estimation [21], knee osteoarthritis severity [22], degree of building damage [23], and Twitter sentimental analysis [24]. These problems present a class attribute with an ordinal domain, such as the dwell time of import containers in a yard. The algorithms proposed in such studies have improved their accuracies employing ordinal regression methods. In contrast, research works proposing machine-learning algorithms for estimating the dwell time [7,8,[14][15][16][17][18], have not explored ordinal regression methods. Thus, we have considered that there is room for improvements estimating the dwell time with ordinal regressors.
In this research, we have stated the estimation of the dwell time as an ordinal regression problem. We have considered that modeling and solving this problem using ordinal regression algorithms, instead of as a supervised classification or regression problem, we can reduce the reshuffles compared to those obtained by means of supervised classification algorithms. Additionally, we have performed attribute engineering by extracting, selecting, and evaluating several attributes. Consequently, this work makes the following contributions:

1.
We have stated and solved, for the first time, the estimation of the dwell time of import containers in a terminal as an ordinal regression problem showing that the ordinal regression approach outperforms the supervised classification approach.

2.
We have constructed and evaluated a set of 35 attributes obtained from the operators' knowledge and the storage data. We have reported, for the first time in the literature, the evaluation of (a) Twenty-four attributes related to the weather forecast at the yard and the destination.
The distance between the yard and the destination of the container. (c) Clusters of containers with similar characteristics. (d) An estimated dwell time using the formula to compute the CPU burst time due to its similarity with our problem.
The remainder of this paper is organized as follows. In Section 2, we present our analysis of some previous proposals for estimating dwell time. Next, we describe materials and methods for modeling and solving the estimation of the dwell time in Section 3. We discuss our results and findings using two datasets built with data collected between 2014 and 2017 in Section 4. Finally, we state our conclusions future applications, and future work in Section 5.

Related Work
Some authors have proposed solutions for estimating the dwell time using statistical and supervised learning algorithms [7,[14][15][16][17][18]. For example, Moini et al. [14], Gaete et al. [16], and Kourounoti [17] stated the estimation of the dwell time as supervised classification problems. Later, Maldonado et al. [7] also modeled this problem as a re-gression problem and evaluated the performance of regression algorithms in addition to supervised classification algorithms.
Moini et al. [14] compared the performance of the algorithms naive Bayes, decision tree, and a hybrid called NB-decision tree to estimate the dwell time. The authors proposed as the class attribute the number of days that a container stays in the yard. They used three metrics to assess the performance of the algorithms, the percentage of instances correctly classified, the Kappa statistic, and the root mean square error.
Gaete et al. [16] evaluated the performance of the algorithms k-nearest neighbor, naive Bayes, One Rule, Repeated Incremental Pruning to Produce Error Reduction, K*, Decision Table, and Zero Rule estimating the dwell time. Their class attribute includes approximately 175 classes obtained from a discretization of the days those containers stay in the yard. To measure the performance of these algorithms, the authors computed the percentage of correctly classified instances, Kappa, and medium square error.
Kourounoti [9] implemented an artificial neural network to estimate the dwell time for a container terminal. The class attribute proposed was the number of days that a container stays in the yard. The author considered three evaluation measures, correctly classified instances, Kappa statistic, and root mean squared error.
Maldonado et al. [7] explored the performance of three algorithms, multiple linear regression, decision trees, and random forest, estimating the dwell time of containers. They compared the mean absolute percentage error for regression algorithms and balanced accuracies for supervised classifiers. The authors assumed the dwell time as a continuous attribute for regression algorithms. However, they discretized the dwell time in three classes (less than a week, between one and two weeks, and more than two weeks) for classification algorithms.
These authors have mainly focused on the selection of the algorithm and the selection of the performance metrics. Nevertheless, a third concern about the dwell time estimation consist of determining what type of machine-learning problem yields the best estimation. By addressing this issue, we can narrow the algorithms and the performance metrics to consider.
Often, the operators measure the dwell time in the number of days that the container stays in the yard [7,9,20]. The number of days is an attribute with an order for which the difference between each value is relevant. Winship and Mare [25] provided a formal definition of ordinal variables (called attributes in our problem). They illustrated their definition with variables such as school grades, ages, or the number of children. These variables should be considered ordinal realizations of underlying continuous variables [25].
Other problems similar to the dwell time estimation for import containers have been solved by adopting artificial neural networks as ordinal regressors. Among them are age estimation [26,27], monocular depth estimation [28], and historical image dating [27]. For these problems, the values of the class attribute describe an order. The authors [26][27][28] reported lower errors when stating and solving them as ordinal regression problems. Similarly, the dwell time of import containers in a terminal describes an order. Thus, we can assume that estimating the dwell time is an ordinal regression problem [25]. However, we have not found a proposal modeling the estimation of the dwell time as an ordinal regression problem.

Materials and Methods
There is a lack of public databases for evaluating the performance of algorithms for estimating the dwell time [29,30]. Therefore, we have set up two datasets with data collected from a yard between 2014 and 2017. The first dataset has 1816 records captured during the years 2014 and 2015. The second dataset has 2974 records captured during the years 2016 and 2017. Besides the dates, another difference between both datasets is that only the first one includes an attribute called "Product", which describes the content of the containers. This difference allows us insight into the relevance of the "Product" stored in a container to predict its dwell time, debated in the literature [14,17,20]. We have split each dataset into two, one for training and validating and the other for testing the algorithms (see Table 1). The diagram in Figure 1 describes the method followed in our experiments. We present six steps and three data sources. In step 1, we propose, evaluate, and select a set of 35 attributes using two train and validation datasets. Such a set of attributes is one of our contributions. In step 2, we train, validate, and select the algorithm for ordinal regression with the lowest MAEµ and the classifier with the highest F1 measure [31], performing cross-validation tests with ten folds. In step 3, we train the selected algorithms with the entire training data. Following, we test such algorithms in step 4, computing their MAEµ using two testing datasets. In step 5, we estimate the dwell time for containers in the testing datasets. Next, we determine the storage space for each container using the estimated dwell time. Following, we compute the reshuffles by replacing the estimated dwell time with their actual dwell time. Finally, we compare the ordinal regression approach against the classification approach regarding their MAEµ and reshuffles in step 6.   Table 2 describes the attributes that we have extracted and constructed for estimating the dwell time. Table 2. Thirty-five attributes studied for estimating the dwell time of import containers in a yard. The second column lists the information stored by the attributes. Attributes one to four store information captured in the records of the container yard as is. The other attributes combine information captured in the records and external information. Rows from one to four list four nominal attributes extracted directly from historical data collected by the operators of the yard: "Client", "Destination", "Product" (first dataset only), and "Type", which represents the dimension of a container, i.e., 20 or 40 feet.

Attribute Extraction, Construction, Evaluation, and Selection
The other rows list attributes built from other attributes and historical data. We have constructed the attribute "Days_in_country" with the difference between the arrival date to the yard minus the arrival date to the country. Additionally, we have split the arrival date of the container to the yard into three attributes "Arrival_week_day", "Arrival_month_day", and "Arrival_month". We have used the attribute "Destination" to compute the attribute "Distance_yard_destination". Attributes "Yard_weather" and "Destination_weather" were built from the arrival date to the yard and the destination of a container. Ninety-five percent of the containers in the datasets stood in the yard for less than 12 days. Therefore, we have analyzed the weather forecast in the yard and the destination from the arrival day (0) to the day number 11 of the container in the yard. We employed the weather forecast provided by Raspisaniye Pogodi Ltd., St. Petersburg, Russia (https://rp5.ru (accessed on 25 November 2018)). This online weather-forecast service provides an overview of the weather forecast or report (for previous dates). The values for the weather overview considered were normal, light rain, heavy rain, rain, rain showers, thunderstorm, mist, precipitation within sight, fog, haze, or in the vicinity showers.
A rough estimation of the dwell time for import containers can be computed with the formula for computing the CPU burst time (see Equation (1)) [32]. This formula estimates the execution time of a task (τ i+1 ) using the actual (t i ) and predicted (τ i ) execution time of the previous task, weighted by 0 ≤ α ≤ 1. The formula for computing the CPU burst time considers that continuous tasks have similar execution times. Sometimes, containers arrive at the yard as clusters, sharing the same client, destination, and content. This estimation is insufficient for stacking the containers because it disregards several attributes that affect the departure date. Nevertheless, we have explored the relevance of having an attribute called "Predicted_dwell_time" (τ) computed with the CPU burst time formula [32]. To compute each (τ i ), we must group similar containers because the CPU burst time formula assumes that CPU tasks are similar. Thus, we have included another attribute called "Cluster" computed with the algorithm KMeans and the validation index VIC [33]. We have sorted containers into each cluster according to their arrival date to the country. Then, we have computed each τ i for similar containers.
The class attribute in our datasets is "Dwell_time". Such class attribute represents the number of days that a container stays in the yard, i.e., the number of days between the arrival and departure dates. Thus, the inferred dwell time of a container in the yard allows us to approximate its departure date and reduce the reshuffles during the unloading operations. Histograms in Figures 2 and 3 plot the distribution of the actual dwell time of the containers in our datasets. These histograms depict a class imbalance in our datasets. According to a standard measure of the class imbalance [34], we have measured the imbalance ratio in 1 350 and 1 449 , respectively. Consequently, we have employed algorithms and evaluation metrics suitable for datasets with class imbalance.  4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 Examples Containers dwell time (Class attribute)

Attribute Evaluation and Selection
We have evaluated the proposed attributes based on two criteria, the correlation between each attribute and the class attribute "Dwell_time" and the correlation among all the attributes [35], and the information gain (G) of each attribute concerning the "Dwell_time" (see Equation (2)). Such an information gain is based on entropy (H) (see Equation (3)). We have employed the implementations in the platform WEKA version 3.8.3.
A brief analysis of the selected attributes sheds light on their worth for estimating the dwell time, justifying the results of the evaluation algorithms. For example, containers with food (attribute "Product") remain three days on average in the yard, while containers with clothes and electronic devices remain five and eight days on average, respectively. Usually, containers have a leasing contract. Therefore, operators tend to deliver containers with several days in the country (attribute "Days_in_country") soon, which reduces their dwell time. According to our data, most containers arriving on Monday (attribute "Ar-rival_week_day") remain one day in the yard, whereas most of those arriving on Friday stay four days. Containers arriving at the beginning of the month (attribute Arrival_month_day) have lower dwell times than those arriving at the ending of the month. Moreover, the data show that containers with a destination (attribute "Destination") near the yard are delivered faster than those with far destinations. Although we built an attribute with the distance from the yard to the destination of a container, the attribute "Destination" reached a higher evaluation than this distance. The attribute "Predicted_dwell_time" is relevant because of the similarities between some arrival containers, such as tasks in a CPU planner.

Dwell Time Estimation as an Ordinal Regression Problem
To determine the best performed ordinal regression algorithm, we have evaluated 15 algorithms proposed in the literature. These algorithms follow three approaches: a naive approach, an ordinal learning approach, and an approach based on decomposing ordinal problems into binary problems. We employed the implementations in the platform WEKA version 3.8.3.
An ordinal regression algorithm with a naive approach consists of converting the values of the ordinal class attribute into numerical values, "Dwell_time" in our case. Next, a naive approach applies a regression algorithm (δ) and uses one of the rounding strategies to map the continuous value calculated by the regressor (δ) to an ordinal value of the original class (see Equation (4)).
We have explored five algorithms with this approach: • Regression using DecisionTable, and optimizing root mean square error (Reg + DecisionTable + RMSE + Ibk + Round) • Regression using LibSVM with linear kernel and rounding (Reg + LibSVM + LinearKernel + Round) • Linear regression with rounding (Reg + LinearRegression + Round) • Classification via regression using a Linear regression (CvR + LinearRegression) • Classification via regression using the algorithm M5P (CvR + M5P) The second approach is called the Ordinal Learning Method (OLM) [36]. OLM generates symbolic rules using comparison operations from examples in ordinal problems with multiple attributes. The algorithm to generate the symbolic classification rules aims to simulate human behavior when solving ordinal regression problems. The author of OLM tested the performance of the algorithm OLM on four datasets achieving results similar to the decision tree C4 for ordinal regression problems.
The third approach is based on decomposing ordinal problems into binary problems with the OrdinalClassClassifier (OCC) [37] method. The algorithm OCC uses supervised classifier for solving problems with ordinal classes. For such a purpose, the algorithm OCC creates a new dataset for each value ( Dwell_time ) of the ordinal class attribute. New datasets have a new binary class attribute instead of the ordinal class. Such a binary class takes value 1 for the examples for which the ordinal class had a value greater than the index of the current dataset. Then, a supervised classifier is trained with the modified datasets obtaining a model for each dataset. To classify a new example (att), each model classifies the example and computes the likelihood of belonging to the binary class (p i (att)). With these likelihoods, the algorithm OCC computes new values according to the index of the model trained. In the last step, the algorithm OCC selects the class value in the ordinal class attribute corresponding to the index of the max new value computed (see Equation (5)).
We have explored the combination of the method OCC with the classifiers listed below. We have included four kernels (linear, polynomial, RBF, and sigmoidal) for the classifier Support Vector Machine (SVM).  [38][39][40][41][42][43][44]. This metric measures the magnitude of the error of each ordinal algorithm, e.g., a 4-days error in the prediction of a container's dwell time is higher than a 1-day error. However, MAE biases the results favoring the majoritarian class when the dataset shows a class imbalance. Our datasets present class imbalance (see Figures 2 and 3). Therefore, we have adopted the Mean Absolute Error modified (MAEµ) [45] as the performance measure. MAEµ is a modification of MAE for imbalanced datasets that consists of computing the MAE for each value of the class attribute in the dataset and then averaging the results. Table 3 summarizes the MAEµ output by the algorithms with both training datasets. The OCC algorithm with the Kernel Logistic Regression method as classifier achieved the lowest MAEµ for both datasets. Hence, we select OCC + Kernel Logistic Regression to estimate the dwell time.

Dwell Time Estimation as a Supervised Classification Problem
To determine the supervised classification algorithms, we have used the Auto-WEKA tool [46] included in WEKA version 3.8.3. Auto-WEKA compares the performance of 30 supervised learning algorithms regarding a metric selected by the user using a crossvalidation test with ten folds. Since our datasets present a class imbalance in a ratio of 1 350 and 1 449 , respectively, we configured Auto-WEKA to select the classification algorithm with the highest F1 measure.
Auto-WEKA outputs a different supervised classifier for each dataset. The algorithm with the highest F1 measure -0.91-estimating the dwell time for the instances in the Previously, we found that the algorithm OCC with Kernel Logistic Regression reached the lowest MAEµ for both datasets by estimating the dwell time as an ordinal regression problem. To achieve this result, we performed several experiments manually configured. Contrarily, Auto-WEKA found a different supervised classifier for each dataset. However, since we aim to show that estimating the dwell time for import containers in a yard must be solved as an ordinal regression problem, we shall compare the three algorithms, OCC+Kernel Logistic Regression, against Lazy-IBK on Dataset1_Test and OCC+Kernel Logistic Regression against Lazy-KStar on Dataset2_Test.

Results and Discussion
We have assumed that by estimating the dwell time with an ordinal regression algorithm, we can reduce the reshuffles in the container stacking compared to those produced with supervised classification algorithms. The ordinal regression algorithm OCC + Kernel Logistic Regression reached the lowest MAEµ in our previous experiments with 15 ordinal regression algorithms using the two training and validation datasets (Dataset1_T&V and Dataset2_T&V). However, we have selected two supervised classifiers (Lazy-IBK and Lazy-KStar) for each dataset according to the results of the Auto-WEKA tool.
There is no consensus about the measures to evaluate the worthiness of the estimated dwell time with machine-learning algorithms for the optimal stacking of containers. Since we have estimated the dwell time as an ordinal regression problem, we can employ the MAEµ as a practical measure. We can also compute the MAEµ for estimating the dwell time as a supervised classification problem but considering the size of the error. Thus, we can compare the performance of both approaches. Moreover, we have included the number of reshuffles as another performance measure, starting from the estimated departure date of each container computed with the date of the arrival to the yard and the estimated dwell time. Using the estimated departure date, we have allocated the containers with the optimization model proposed by De Armas et al. [2] and implemented using the solver GNU Linear Programming Kit (GLPK (http://www.gnu.org/software/glpk (accessed on 14 July 2019))) version 4.64. The optimization algorithm output a matrix with the positions assigned to the containers in the yard. Finally, we have substituted the estimated departure date of each container with its actual departure date (obtained from the recorded data) and computed the number of reshuffles.
We have considered three configurations of the container yard (45 stacks with three tiers, 90 stacks with three tiers, and 45 stacks with five tiers). We have started with an empty yard for each configuration, with the maximum number of allocation spaces available. Moreover, we have set up 21 optimization instances (see Table 4) using the examples in the datasets Dataset1_Test and Dataset2_Test. An optimization algorithm can reach the optimal solution with different containers' allocations. Therefore, such an optimization algorithm can yield dissimilar numbers of reshuffles for the same optimization instance. Reshuffles come from the use of the estimated departure date for optimizing the containers' allocation but the actual departure date for computing the reshuffles. Figure 4 illustrates an example with three allocations to the containers with the same optimization value (0 reshuffles) but different numbers of actual reshuffles. Therefore, reducing the MAEµ does not always reduce the number of reshuffles, but in general, it does, as we shall show with our experimental results.

Stack 1 Stack 2 Stack 3
Zero reshuffles using the actual departure date.
One reshuffle using the actual departure date.
Three reshuffles using the actual departure date.   The ordinal regression algorithm reached a lower MAEµ for 15 (71%) instances and a lower number of reshuffles for 14 (67%) instances. Figure 7 depicts the rank sums comparing the ordinal regression against the classification regarding MAEµ and reshuffles. The Wilcoxon signed ranks test [47] indicated that there were significant differences between both methods regarding the MAEµ (p-value = 0.009128), but the differences were not significant regarding the reshuffles (p-value = 0.1393). Nevertheless, the dwell time estimated with the ordinal regression algorithm generated 87 reshuffles less than the estimated with the supervised classifiers in these 21 experiments. Additionally, in 16 of these experiments, a lower MAEµ induced a lower number of reshuffles. The other five experiments showed a different behavior due to the influence of the optimization algorithm previously explained. From these results, we conclude that the dwell time of the containers is estimated more accurately as an ordinal regression problem than as a supervised classification problem, at least using these attributes of the containers. Moreover, all the algorithms reached a higher MAEµ and number of reshuffles using the examples in the Dataset2_Test than the examples in Dataset1_Test (instances 1, 2, 3, 4, 11, 12, 16, and 17). Since only Dataset1 includes the attribute "Product", we recommend including this attribute to improve the algorithms for the dwell time estimation.

Conclusions
In this work, we aimed to show that estimating the dwell time of import containers in a yard is an ordinal regression problem. Thus, we modeled and solved the problem as both an ordinal regression problem and a supervised classification problem; this last is the trending approach in the literature. To corroborate our hypothesis, we compute the MAEµ, and the reshuffles provoked during the container stacking using the dwell time estimated by each approach.
As a result of our research, we noticed that the ordinal regression algorithm achieved the lowest MAEµ in 15 (71%) of our 21 experiments. Similarly, the ordinal regression algorithm yielded the lowest reshuffles in 14 (67%) experiments. The statistical test corroborated significant differences between both methods regarding the MAEµ, while differences were not significant regarding the reshuffles. Since different optimal allocations of the containers can lead to different numbers of reshuffles, we examined the relationship between MAEµ and reshuffles. We noticed that a lower MAEµ conducts to a lower number of reshuffles in 16 (76%) experiments. Therefore, we can state that by modeling the dwell time estimation as an ordinal regression problem and decreasing the MAEµ, we can reduce the number of reshuffles generated during the stacking of import containers in a yard.
We found a subset of six attributes relevant for estimating the dwell time of import containers in a terminal from a set of 35 evaluated. Such attributes are "Days _in_country", "Ar-rival_week_day", "Arrival_month_day", "Destination", "Product", and "Predicted_dwell _time". The attribute "Predicted_dwell _time" approximates the dwell time using the formula to compute the CPU burst time in groups of similar containers. Attributes "Days _in_country", "Arrival_week_day", "Arrival_month_day" are derived from the arrival date of the containers to the country. "Destination" and "Product" were obtained from the containers' information. Moreover, we evaluated 25 attributes related to the weather forecast and the distance between the yard and the destination of the container. Such an evaluation showed that they were irrelevant for the dwell time estimation of the containers in our datasets.
We observed that all the algorithms produced a higher MAEµ and reshuffles for the examples in the dataset that excluded the attribute "Product". Hence, we recommend including the attribute "Product" for estimating the dwell time of import containers in a yard. Nevertheless, further experiments are needed to support this recommendation.
Our results can be applied on dwell time estimation problems where the dwell time is an ordinal variable. Two additional examples of these problems are the dwell time estimation of ships in docks and the dwell time of buses in a bus workshop for repairing.
For these examples, an accurate dwell time estimation may conduct to saving resources, as our research work does.
We propose evaluating other ordinal regression algorithms, such as deep neural networks for ordinal regression, as future work. Moreover, we shall be working on a new dataset about import containers available for the research community to reduce the lack of available datasets. Funding: This research was funded by Universidad Central "Marta Abreu" de Las Villas grant number 10332. Tecnologico de Monterrey sponsored the article processing charge. Additionally, "Centro de Carga y Descarga de Contenedores de Almacenes Universales SA. in Villa Clara, Cuba" collaborated during the research.