4.1. Parking Profile
The parking profile characterizes the parking behavior. It is determined by a variety of elements, such as the location, capacity, amenities (the number of different amenities around the parking and relevant amenity), weather, day, time of the day, type of the day (weekday or weekend), etc.
In our work, we consider the largest number of spatial and temporal properties to build a complete pattern as below:
Spatial parking lot characteristics, defined by the spatial component, represent information describing where the parking lot is located, the maximum capacity, and information about the amenities surrounding the parking lot, especially their type (e.g., restaurants, railway stations, commercial centres, etc.) and number. The spatial profile thus tends to be less dynamic and will not change frequently. In addition, the maximum parking lot capacity is regarded as crucial information for spatial clustering. This information can be represented by the maximum capacity of the parking lots.
Temporal parking lot characteristics are defined by dynamic information that changes over time and describes the parking lot occupancy trend. The parking dynamic also depends on exogenous factors, such as weather, time of the day, and day type.
Both spatial and temporal parking characteristics are important properties when defining an occupancy forecasting model, which may improve the forecast quality [
3]. We thus integrate parking spatial and temporal features to define our parking profiles. In our work, the parking profile is defined as a tuple:
The spatial part in the profile is defined as follows:
The temporal part in the profile is defined as follows:
is the number of occupied places in observed sequence i at time t.
i is the frequency used for collecting the occupancy updates.
n is the length of the sequence corresponding to the number of observations in the interval between and . These observations are conducted in our case at each 5 min.
The temporal profile stores the number of occupied places, which changes dynamically over time. This attribute is a time series providing a set of observations collected with a constant frequency. In the temporal profile, exogenous factors represent the external aspects that potentially impact the parking lot’s occupancy over time. These exogenous factors integrate, for example, weather information (between 0 and 1) and the type of day, with a distinction between weekdays and weekends. The temporal profile is generated using one weekly parking occupancy window.
In the following, we explain how we exploit our parking profiles to identify similarities between parking lots of the same city and then share occupancy forecasting models.
4.2. Spatiotemporal Clustering of the Set of Parking Lots
Once defined, the parking profile can be used to find similarities between parking lots. We, therefore, provide a system to measure parking profiles’ similarity and group parking lots accordingly. Clustering is an essential data mining technique for grouping data points into homogeneous groups (or clusters). Clustering techniques are well-known to provide a simple solution for spatial and temporal grouping [
36]. In our work, we try to group similar spatial and temporal parking lot profiles. The parking lots are provided without prior knowledge regarding their membership in a group. Therefore, our spatiotemporal clustering solution exploits an unsupervised approach to discover groups of parking lots sharing the same occupancy trends.
In order to group the parking lots according to their profiles, we apply a two-step unsupervised clustering process. We thus build several types of clusters, namely the spatial cluster and the temporal cluster to finally obtain the spatiotemporal cluster obtained by combining both previous ones.
More precisely, we exploit for the spatial dimension a straightforward yet successful method based on K-means clustering with Euclidean distance (ED) [
37], and dynamic time warping (DTW) [
38] for the temporal one. K-means provides a fair trade-off between the quality of the solution found and the computational cost [
39]. K-means has several benefits compared to other clustering algorithms since it is suitable for large unlabeled datasets and has a linear time complexity with large datasets.
4.2.1. Spatial Cluster
In this section, we focus on spatial clustering. In order to group parking lots according to their spatial profile, we consider two distinct types of input in our clustering process:
Numerical inputs, such as maximum capacity, longitude, latitude, number of parking nearby, and the number of amenities per type around (amenity distribution), can be clustered in the straightforward mechanism for numeric.
Categorical inputs, such as relevant amenities, require mixed input type clustering techniques, such as K-prototype [
To compute our spatial cluster, we use the parking spatial profile as an input vector. Initially, we place the cluster centroid randomly. The centroid is then relocated based on the computed average distance of each member (spatial profile) of the spatial cluster to its centroid using Euclidean distance. This calculation is repeated until the process converges and there are no more cluster assignments.
Before applying clustering algorithms to the spatial profile, numerical data are normalized. In the case of longitude and latitude, normalization is typically not necessary. This is because longitude and latitude are already in a standard range to avoid bias and to use a consistent input scale. We thus target values in the 0 to 1 range that are computed with Equation (
is the smallest value in the dataset before normalization.
is the largest value in the dataset before normalization.
is the value of the data point after normalization.
X is the original value of the data point before normalization.
Based on the spatial profile, we generate spatial dissimilarity matrix using the Euclidean distance, computing six main elements as an input, calculated using Equation (
is Euclidean distance
is attribute
is the start point of attribute i
is the end point of attribute j
It compares the pairwise distance of each spatial profile. A chosen measure of distinction between the
spatial profile(
i)th and
spatial profile(
j)th object is equal to the value of the (
ij)th element in this square-symmetrical spatial profile matrix. The diagonal elements are equal to zero. Then, we group the spatial profiles using K-prototype approach [
40], that is, by combining K-means for numerical with K-modes for categorical.
We need to preprocess the data by converting categorical variables (relevant amenities in our case) into numerical variables. One common method is to use one-hot encoding, where we create binary columns for each category and assign a value of 1 to the corresponding column for each data point. Then, we apply K-means clustering to the numerical data to group the similar numerical data points together. We choose the optimal number of clusters using silhouette score. Afterward, we apply K-modes clustering to the one-hot-encoded categorical data to group the similar categorical data points together. We again choose the optimal number of clusters using silhouette score. Once we have clustered both numerical and categorical data, we can combine the clusters by assigning each data point to the nearest numerical cluster and nearest categorical cluster. We use the Euclidean distance matrix to measure the similarity between data points. Finally, we evaluate the results of the combined K-means and K-modes clustering by calculating the silhouette score, which measures the similarity of a data point to its own cluster compared to other clusters. A higher silhouette score indicates better clustering performance.
4.2.2. Temporal Cluster
The temporal clustering groups the parking occupancy patterns according to trend, seasonality, and cycle, which change dynamically over time. For the clustering task, we use K-means to temporally group the parking areas based on their profiles. There are two well-known matrices to measure the distance or similarity between two series, which are Euclidean distance (ED) [
37] and dynamic time warping (DTW) [
38]. The limitation with the use of Euclidean matrices for time series clustering resides in the fact that Euclidean distance requires series of same length. When there are temporal shifts, the correlation between the two series is not correctly determined. Hence, we apply DTW for temporal distance measurement and grouping in our approach to obtain better temporal clusters.
The DTW distance between two time series
is obtained using Equation (
is the base distance:
Dynamic time warping (DTW) is a technique used to compare two time series sequences, even if they have different lengths, by finding the optimal alignment between them.
In DTW, a cost matrix is computed between the two sequences to represent the pairwise distance between each element in the two sequences. The cost matrix is then used to find the optimal warping path, which is the path through the matrix with the lowest total cost. To compute the cost matrix efficiently, many implementations use a binary matrix representation and corresponding elements (this refers to the element in one sequence that is matched or aligned with a specific element in the other series based on the DTW algorithm). This is because matching elements are typically small and can be processed quickly, making them well-suited for computing the cost matrix efficiently.
In the binary matrix representation, each element in the matrix is either 0 or 1, depending on whether the two corresponding elements in the time series match or not. This representation reduces the dimensionality of the problem and allows for efficient computation of the cost matrix.
4.2.3. Spatiotemporal Cluster
Considering the spatial and temporal cluster deployment, we define a spatiotemporal cluster by combining both of them using the Cartesian Product operator:
We design spatial and temporal clusters separately because the characteristics of the parking spatial profiles are less dynamic than those of the temporal profiles where parking occupancy evolves over time.
We combine the two clustering approaches to obtain a multi-clustering result. Indeed, as spatial and temporal features are not suitable for being handled together, our multi-clustering approach helps to define a cascade of clusters. In this way, we could organize the parking lots into meaningful groups from different perspectives.
4.2.4. Cluster Evaluation
A common limitation of K-means implementation resides in identifying the best
k or the number of clusters. The elbow approach [
41] and silhouette analysis [
42] are two popular visual methods for determining the ideal number of clusters. Both these methods are used in our study to guarantee the clustering quality and identify the target number of spatial and temporal clusters for the set of parking lots.
The quality of our clustering is computed using the silhouette score using Equation (
4). When applied in comparison with all other clusters, this index evaluates how similar to its own cluster each individual observation is.
a is the average distance within each item in the cluster
b is the average distance between the clusters
The silhouette score is a metric used to evaluate the quality of a clustering algorithm’s output. It measures how similar an object is to its own cluster compared to other clusters. A silhouette score ranges between −1 and 1, where a score closer to 1 indicates a well-clustered data point, while a score closer to −1 indicates that the data point may belong to the wrong cluster.
4.3. Sharing Parking Occupancy Forecasting Model
Once the spatiotemporal clusters are defined, our goal is to exploit them to facilitate the deployment of a parking occupancy forecasting model at the city scale. Our goal here is to share parking occupancy forecasting models among several parking areas (located in the same cluster) to avoid paying the high tuning cost when deploying one model per parking lot independently. Obviously, when sharing models between several parking areas, we have to preserve a good forecast accuracy. In the following, we explain how we create a reference forecast model developed for a single parking lot and adapt this model for the other parking lots belonging to the same cluster.
Initially, we define a reference parking lot. Its profile is selected by computing the closest parking profile to the cluster centroid by obtaining the smallest distance between centroid and parking lots in each cluster. Related to this reference profile, based on our previous work [
8], the reference parking occupancy forecasting model (i.e., RNN-LSTM) is tuned and trained. Its quality is evaluated using
, calculated using Equation (
A reference model is a model chosen from the cluster and developed to represent the parking profile at the same cluster. It can be used to create forecasts without any further adjustments because reference models are typically simple and quick to implement.
We have a list of:
the dissimilarity between parking reference to the other parking lot in the same clusters and different clusters.
the model performance of forecasting model that is trained and tested in the same cluster.
Afterward, we compute the correlation between the list of dissimilarity and the list of model performance that represent parking lots in the same cluster using Equations (
6) and (
7). We iterate the same steps for parking lots that belong to different clusters. To share the forecasting model, we examine the correlation between the dissimilarity amongst parking lots and the forecasting model performance.
To know the measure of the linear relationship between continuous features (model performance and the distance) amongst parking lots, we use Pearson correlation. The Pearson correlation coefficient assesses the statistical link, or association, between two continuous variables. It provides information on the amount and direction of the relationship’s link, or correlation. Equation (
6) is used to compute the Pearson correlation coefficient (
r) between two random variables,
x and
While coefficient determination or
, in Equation (
7), expresses the fraction of the variance in dependent variables caused by independent factors,
Y represents the dependent variable’s actual value,
is dependent variable’s mean value, and
represents prediction value [
To recapitulate, one of our contributions in this article consists of designing an original framework that facilitates sharing of parking occupancy forecasting models among multiple parking lots exhibiting similar spatiotemporal characteristics. Specifically, we propose a novel approach that not only maintains the quality of the forecasts but also achieves time savings by significantly reducing the number of forecasting models that need to be individually tuned and trained for each parking lot. This obviously has a positive impact on the deployment cost of the models by reducing several items, such as the time spent on offline training, validation, and testing for each individual model, hyperparameters tuning execution time for each model with different hyperparameter combinations, model updating time, and the deployment time for clustering.
It is also important to mention that the cost needed to deploy a city-level parking occupancy forecasting model will vary significantly depending on the number of parking lots to consider, the number of hyperparameters to tune, and their possible combinations (the size of hyperparameter search space), the number of samples per hyperparameter considered for training, validation, and testing. A thorough assessment of these factors is necessary to estimate the precise cost associated with the model deployment. An experimental cost analysis is detailed in
Section 5.4.
Concerning the hyperparameters tuning, the search space size depends on the combinations of several elements, such as the learning rate, number of layers, number of neurons per layer, the optimizers, and activation function.
Table 1 details the search space of hyperparameters tuning. The elements presented in this table are used to generate combinations that we used to run our experiments in order to find the optimal forecasting model.
Hence, there exist approximately 270 million potential combinations within the provided hyperparameter tuning search space. It is challenging to estimate the overall execution time for training the 270 million possible combinations in the provided hyperparameter tuning search space. Several factors, including hardware performance, software implementation, dataset size, and computational complexity, all contribute to the difficulty. We can, however, make a preliminary estimate based on certain assumptions. We may compute the overall execution time by assuming an average training time of one hour for each model configuration:
Based on this number, using grid search, which tests every possible combination, it appears to be impossible to achieve, which is why we opted for the random search method with randomly testing 100 combinations from the search space, which takes approximately 4 days to train a model. However, if we apply the same mechanism to create a model for each parking lot at the city level, the cost would increase significantly. By using our clustering and shared model approach, we can significantly reduce the number of models that need to be trained. This reduction in models leads to a substantial decrease in the time needed for deploying forecasting models.
It is crucial to note that this estimation neglects to account for any overhead time, data loading, preprocessing, or computer environment constraints, such as session duration constraints.
Furthermore, the actual execution time may vary based on a variety of parameters, including the complexity of the RNN-LSTM model, dataset size, and implementation efficiency. Using these estimations, the total execution time for computing the whole search space is anticipated to be in the billions of hours. The lengthy execution time may not be feasible within the chosen computer environment. Thus, it is recommended to explore alternative computing resources, such as cloud-based platforms equipped with high-performance GPUs or distributed computing systems, to effectively handle the extensive processing workload. These options offer the required computational power and scalability for efficient exploration of the vast hyperparameter tuning search space. However, it is important to consider the associated financial costs of providing a suitable machine and environment for hyperparameter tuning.