A Unified Spatio-Temporal Inference Network for Car-Sharing Serial Prediction

Car-sharing systems require accurate demand prediction to ensure efficient resource allocation and scheduling decisions. However, developing precise predictive models for vehicle demand remains a challenging problem due to the complex spatio-temporal relationships. This paper introduces USTIN, the Unified Spatio-Temporal Inference Prediction Network, a novel neural network architecture for demand prediction. The model consists of three key components: a temporal feature unit, a spatial feature unit, and a spatio-temporal feature unit. The temporal unit utilizes historical demand data and comprises four layers, each corresponding to a different time scale (hourly, daily, weekly, and monthly). Meanwhile, the spatial unit incorporates contextual points of interest data to capture geographic demand factors around parking stations. Additionally, the spatio-temporal unit incorporates weather data to model the meteorological impacts across locations and time. We conducted extensive experiments on real-world car-sharing data. The proposed USTIN model demonstrated its ability to effectively learn intricate temporal, spatial, and spatiotemporal relationships, and outperformed existing state-of-the-art approaches. Moreover, we employed negative binomial regression with uncertainty to identify the most influential factors affecting car usage.


Introduction
Car-sharing companies have gained significant popularity in modern society due to their cost-effectiveness and convenience, providing a flexible alternative to traditional car ownership.These services alleviate various issues related to lease payments, maintenance, and parking, making them an appealing option for users seeking a hassle-free mobility solution.Beyond individual benefits, these systems contribute to reduced traffic congestion, lower carbon emissions, and minimized air pollution, positioning them as a sustainable and environmentally friendly transportation option.
However, the spatial and temporal distribution of cars across company parking stations presents a critical challenge for car-sharing firms.Accurate demand prediction is essential for optimizing resource allocation, enhancing rental rates, and improving customer satisfaction.To address these challenges, these companies leverage GPS tracking data to predict demand patterns and allocate resources effectively.These data contain a wide range of factors, such as temporal features (e.g., the average demand value in the last four time intervals), spatial features (e.g., longitude and latitude of the parking station), meteorological features (e.g., weather conditions), event features (e.g., holidays), and categories of points of interest near every station [1].Various techniques, including predictive analytics and machine learning algorithms, aid in identifying demand trends and patterns, enabling companies to adjust their operations accordingly.
To ensure a balanced distribution of cars across various parking lots throughout the day, we propose a comprehensive Unified Spatio-Temporal Inference Prediction Network (USTIN) model.The USTIN model is a unified architecture that incorporates a temporal feature unit, a spatial feature unit, and a spatio-temporal feature unit.Leveraging a combination of Temporal Convolutional Networks (TCN), Long Short-Term Memory (LSTM), and Graph Convolutional Network (GCN), the model effectively processes and analyzes the data.Furthermore, the utilization of negative binomial regression with uncertainty has allowed for the analysis of the most influential factors affecting car usage in parking stations.
The highlights of our work include the following: -Proposed USTIN, a unified neural architecture for car-sharing demand prediction, integrating temporal, spatial, and spatio-temporal features across multiple units.-Achieved state-of-the-art prediction accuracy by effectively capturing complex spatial, temporal, and spatio-temporal influences on car demand.-Identified the most influential demand factors through negative binomial regression with uncertainty to further enhance predictions.
The rest of the paper is organized as follows: Section 2 provides a literature review of the current studies on serial prediction models.In Section 3, we introduce an overview of the methods used.Section 4 details the experimental framework employed to evaluate our approach's performance.Section 5 analyzes our prediction results.Finally, Section 6 concludes the paper and outlines potential directions for future research.

Literature Review
The objective of the traffic prediction problem is to predict future traffic flow using historical data.The key work in this area includes the DMVST-NET proposed by [1], employing local CNN and LSTM to model spatial and temporal relationships in flow.Additionally, graph deep learning techniques have gained prominence for relationship modeling within traffic networks.The authors of ref. [2] proposed a multi-graph convolutional network and an Attention-based Spatial-Temporal Graph Neural Network (ASTGNN) to model the relationships within flow networks.Similarly, the authors of ref. [3] developed a Hybrid Spatio-Temporal Graph Convolutional Network (H-STGCN) to deduce future travel time from upcoming traffic volume.
Furthermore, the challenge of predicting traffic flow is closely related to the growing need for accurate car-sharing system demand prediction [4].Car-sharing services have exploded in popularity in recent years as an alternative mode of urban transportation.However, effectively managing these systems requires the reliable prediction of where and when vehicles will be needed.As such, many studies have begun exploring predictive models of car-sharing demand, and investigating different influencing factors.The authors of ref. [5] looked into the effects of time horizons, environmental conditions, and learning algorithm types on the prediction of vehicle availability in car-share systems.The authors of ref. [6] estimated the distance to the closest available vehicle in a fleet, whereas other researchers examined multidimensional optimization problems like station-based vehicle relocation [7,8].
Recent studies have introduced innovative models to enhance efficiency from multiple perspectives.The authors of ref. [9] compared spatially implicit Random Forest models with spatially aware methods for the spatially aware analysis of car-sharing demand.The authors of ref. [10] evaluated the use of Long Short-Term Memory (LSTM) and Prophet techniques for predicting the demand for car-sharing services.Furthermore, the authors of ref. [11] proposed a maximum entropy approach for modeling car-sharing parking dynamics.
Advancements in deep learning have shown promise in extracting spatial and temporal features for demand prediction [12].However, effectively modeling spatial factors remains a challenge.Several studies have considered the influence of Points of Interest (POIs) near parking stations [13,14].Notably, spatial imbalances between vehicle supply and demand have been addressed with relocation strategies [15].Nevertheless, these models often exhibit limitations in capturing detailed spatial factors.Despite recent strides, prior works lack multi-time scale designs to capture periodical seasonalities.Addressing this gap, the authors of refs.[16,17] introduced different timeframe durations, yet model performance diminishes in regions with varying demand densities.
This study bridges existing gaps by integrating POIs and meteorological features, taking into consideration varied time scales and addressing travel demand density.The proposed model aims to enhance the accuracy and generalization capacity, offering a holistic approach to travel demand prediction.

Unified Spatio-Temporal Inference Prediction Network
The overall architecture of the proposed Unified Spatio-Temporal Inference Prediction Network (USTIN) model is described in Figure 1.The model predicts the number of vehicles that are going to be used at a given prediction horizon.
(POIs) near parking stations [13,14].Notably, spatial imbalances between vehicle supply and demand have been addressed with relocation strategies [15].Nevertheless, these models often exhibit limitations in capturing detailed spatial factors.
Despite recent strides, prior works lack multi-time scale designs to capture periodical seasonalities.Addressing this gap, the authors of refs.[16,17] introduced different timeframe durations, yet model performance diminishes in regions with varying demand densities.
This study bridges existing gaps by integrating POIs and meteorological features, taking into consideration varied time scales and addressing travel demand density.The proposed model aims to enhance the accuracy and generalization capacity, offering a holistic approach to travel demand prediction.

Unified Spatio-Temporal Inference Prediction Network
The overall architecture of the proposed Unified Spatio-Temporal Inference Prediction Network (USTIN) model is described in Figure 1.The model predicts the number of vehicles that are going to be used at a given prediction horizon.Our approach incorporates three distinct units: a temporal feature unit, a spatial feature unit, and a spatio-temporal feature unit [18].The different units extract key frames, enabling an accurate prediction of travel demand.The temporal unit is designed to capture temporal dependencies and comprises four layers, each corresponding to a different time scale.The spatial unit focuses on capturing spatial dependencies using Points of Interest (POIs), while the spatio-temporal unit integrates weather data to effectively capture spatio-temporal correlations.Finally, the outputs obtained from each unit are combined in the feature module fusion and training unit to generate accurate predictions of passenger demand.

The Temporal Feature Unit
The temporal feature module contains four time scale-related layers, namely monthly  , weekly  , daily  , and hourly  layers.Our approach incorporates three distinct units: a temporal feature unit, a spatial feature unit, and a spatio-temporal feature unit [18].The different units extract key frames, enabling an accurate prediction of travel demand.The temporal unit is designed to capture temporal dependencies and comprises four layers, each corresponding to a different time scale.The spatial unit focuses on capturing spatial dependencies using Points of Interest (POIs), while the spatio-temporal unit integrates weather data to effectively capture spatio-temporal correlations.Finally, the outputs obtained from each unit are combined in the feature module fusion and training unit to generate accurate predictions of passenger demand.

The Temporal Feature Unit
The temporal feature module contains four time scale-related layers, namely monthly F Monthly , weekly F Weekly , daily F Daily , and hourly F Hourly layers.
The demand data for each layer are defined as a tensor G.Each layer corresponds to a Temporal Fusion Network (TFN) structure that effectively captures the temporal correlation, as shown in Figure 2.

1.
Temporal convolutional network layer (TCN) The demand data for each layer are defined as a tensor G.Each layer corresponds to a Temporal Fusion Network (TFN) structure that effectively captures the temporal correlation, as shown in Figure 2.

Temporal convolutional network layer (TCN)
The tensor G is fed into a TCN layer to capture the temporal dependencies in the input data.The output G is then denoted as follows: W : the weight matrix of the convolutional filter.b : the bias term.⊙: the convolution operation.

Self-attention mechanism layer
A self-attention mechanism is used to learn the attention weights that determine the importance of the features: W , W : weight matrices for the query and key projections.d : dimension of the key vectors.

Long short-term memory layer (LSTM)
The output of the self-attention mechanism enhances the LSTM's capacity to capture temporal dependencies.This process is represented as follows:  The tensor G is fed into a TCN layer to capture the temporal dependencies in the input data.The output G t is then denoted as follows: W t : the weight matrix of the convolutional filter.b t : the bias term.⊙: the convolution operation.

Self-attention mechanism layer
A self-attention mechanism is used to learn the attention weights that determine the importance of the features: W q , W k : weight matrices for the query and key projections.d k : dimension of the key vectors.

Long short-term memory layer (LSTM)
The output of the self-attention mechanism enhances the LSTM's capacity to capture temporal dependencies.This process is represented as follows: ⊙ : element-wise multiplication.

Temporal embedding layer
The temporal embedding layer is used to embed the input into a lower-dimensional space that captures the temporal relationships: W t : weight matrix.b t : bias.The four time scale-related layers are then fused.⊗ denotes the Hadamard product, W p , W D , W W , and W M are the weight matrices of the time scale-related layers, and b sp is the bias.The output of the temporal feature module is defined according to Equation (10).

The Spatial Feature Unit
To effectively process Point of Interest features (POIs), we have designed a model architecture that comprises the following: 1.
Spatial density calculation POI density represents the concentration of various points of interest around every parking station.We comprehensively consider the number of POIs and the spatial distance within a Radius R.
d S i , POI j : the distance between station s i and POI j ; r: the radius of the Earth; ∆lat ij : the difference in latitude between station s i and POI j ; ∆lon ij : the difference in longitude between station s i and POI j ; lat i : latitude of station s i ; lat j : latitude of station POI j .The density of each POI (D POI j ) is determined as follows:

Regression model
The car-sharing variance is significantly greater than its average, showing an overdispersion phenomenon [19].Therefore, we use the negative binomial distribution to estimate the parameters.The regression model is given by the following: The model includes the order quantity for each car-sharing station (u i ), the density of the POI category (x 1 , . .., x n ), an intercept (β 0 ), coefficients (β 1 , . .., β n ) for corresponding variables, and an error term (ε).

Spatiotemporal embedding layer
The tensor G POI and the vector W POI , containing weights corresponding to the coefficients in Equation (13), are input into a spatiotemporal embedding layer: Sensors 2024, 24, 1266 6 of 18 4.

Graph convolutional network layer (GCN)
The output of the spatiotemporal embedding layer is fed into a GCN layer.This layer employs the mean aggregation function to capture spatial relationships among POIs: 5.

Fully connected layer
We used a neural network architecture with fully connected layers for feature extraction.
W MC : weight of the fully connected layer.b MC : bias of the fully connected layer.

The Spatio-Temporal Feature Unit
We used a neural network architecture with fully connected layers for the meteorological features.
G ME : meteorological feature sensor.W ME : weight of the fully connected layer.b ME : bias of the fully connected layer.

Feature Module Fusion and Training
The model integrates the obtained outputs via a weighted summation (Equation ( 18)).
The prediction result of passenger demand ∼ X tk is obtained using Equation (19).
We adopt back-propagation with the Adam optimiser to improve the training efficiency [18].

Standard Error
The significance of the estimated values is assessed using the standard error.

Standard Errors of Marginal Effects
To provide a measure of uncertainty, standard errors of marginal effects are associated with the marginal effects of the predictors on the response variable.
Sensors 2024, 24, 1266 : Partial derivatives of the predicted values (u i ).The significance of the factors' impact is defined as follows:

p-Values for Marginal Effects
p-values for marginal effects provide insights into the significance of the factors.
Z: standard normal random variable.
|Z i |: z-score of the i-th marginal effect.

Experiment
Section 4.1 provides an illustration of the dataset's details, while Section 4.2 describes the experimental setting.Section 4.3 goes over the baseline models against which our model was evaluated.We describe the model configurations and the evaluation metrics in Sections 4.4 and 4.5, respectively [21].

Data Description
In our study, we used the Chongqing car-sharing company's dataset for predicting car-sharing demand, along with weather data that were acquired via web crawling [21].Furthermore, we obtained the point-of-interest dataset via web crawling to enhance the comprehensiveness of the features used in our predictive model.

Weather Condition Dataset
In our work, we considered that meteorology data affected car-sharing demand [21].Meteorology data, such as weather conditions and temperature, were collected using a Python-based Selenium web crawler to scrape the Chongqing weather condition from 1 January 2017, 00:00:00 to 31 March 2019, 23:00:00.

Points of Interest Dataset
The car-sharing dataset was augmented with Points of Interest (POIs) data using the Baidu API for web crawling.This process involved obtaining and integrating supplementary location-based information such as restaurants, cafes, museums, cultural landmarks, and so on.The data crawling aimed to enhance the quality and diversity of the original dataset.
Table 1 presents the influencing indicator system used to determine the potential demand for car-sharing.

First-Level Indicator Second-Level Indicator
Usage feature x 1 : Rented cars.

Temporal features
x 2 : workday (1 for yes and 0 for no), x 3 : rushhour (1 for yes and 0 for no).
Building land attribute

Data Pre-Processing
Raw data may contain noise, outliers, missing values, or irrelevant features, which can negatively affect the performance of machine learning models [22].Before analysis, we applied pre-processing methods as follows: (1) Imputation: Due to the numerical meaning of the missing values [21], we replaced them using K-nearest neighbours' imputation.
(2) Normalization: The dataset was normalized using min-max scaling, involving scaling the numerical features to a range between 0 and 1.
The models were implemented using a PC with an i7 Intel (R) Core™i7-7500U CPU running at 3.00 GHz and 8 GB RAM with the Windows 10 operating system under the Python 3.7 development environment [23].

Baseline Methods
The following section outlines the baseline models against which we compared the proposed model: (1) Multiple layer perceptron (MLP) MLP is a feedforward neural network [24].The network learns to map input data to the target output using backpropagation, adjusting the weights to minimize the difference between the predicted and actual outputs.
(2) K-Nearest Neighbours (KNN) KNN works by finding the k closest neighbours; it makes predictions based on the outcome of the k neighbours closest to that point [25].
(3) Random Forest (RF) A random forest is a collection of tree predictors [26], where each tree is generated using a random vector sampled independently from the input vector [27].
where A t is the actual value, F t is the forecast value, and n denotes the number of fitted points.

Discussion
We compared our USTIN model against several baseline models, including KNN, LSTM, RF, and MLP.Metrics such as MAE, MSE, RMSE, and MAPE are used in respective order to evaluate the results and make comparisons between our model and other state-ofthe-art models.
Note that the smallest errors are shown in bold text in Tables 4 and 5.

Car Usage Prediction
The main objective of this study was to build a predictive model for vehicle usage in parking lots.By predicting car usage, parking facility managers can optimize resource allocation, improve traffic flow, and enhance customer satisfaction.

Full Data Experiment
Table 4 illustrates a performance comparison between the proposed method and baseline methods for predicting car usage in every parking station.The results show that USTIN achieves the lowest MAE (0.0308), MSE (0.1541), RMSE (0.3925), and MAPE (0.1077) among all the methods.Notably, KNN and MLP perform poorly (i.e., KNN and MLP have a MAPE of 0.5709 and 0.8874, respectively).The poor performance of the baseline models can be attributed to their failure to model the different dependencies, unlike our proposed model, which leverages temporal, spatial, and spatio-temporal information to make predictions.

Clustered Data Experiment
We applied our model to the entire dataset, demonstrating its robust performance in predicting car-sharing demand.To further analyze the model's performance, we also implemented our analysis in four distinct classes.For the sake of organization and not being redundant in our explanations, we only discuss the result analysis of class "A", as other classes exhibit the same behavior and lead to the same conclusion [21].

Most Influential Points of Interest
The negative binomial regression with uncertainty was used to determine the key factors that impact car usage.
(3) Class C analysis.Hotels (β 7 = 0.0427, p-value = 0.026) and shopping centres (β 8 = 0.0412, p-value = 0.044) remain important factors, reinforcing their role as key factors in car-sharing demand in Table 7-Class C. Leisure and entertainment (β 12 = 0.0465, p-value = 0.021) also play an important role in increasing car sharing.3 and 4 present a comparison between the predicted values and the actual values obtained using the USTIN model.The results show the efficacy of the proposed neural network architecture.The integration of temporal features, spatial features, and spatio-temporal features has significantly enhanced the model's predictive accuracy.Introducing spatial features allows the model to consider factors that are not inherently present in the spatial-temporal data but have a substantial influence on it.Furthermore, the spatio-temporal unit captures the influence of meteorological conditions across locations and time.

Prediction Results
Figures 3 and 4 present a comparison between the predicted values and the a values obtained using the USTIN model.The results show the efficacy of the prop neural network architecture.The integration of temporal features, spatial features spatio-temporal features has significantly enhanced the model's predictive accuracy troducing spatial features allows the model to consider factors that are not inherently sent in the spatial-temporal data but have a substantial influence on it.Furthermore spatio-temporal unit captures the influence of meteorological conditions across loca and time.
Overall, this study highlights the effectiveness of the proposed architecture i hancing car-sharing demand prediction in urban environments.Overall, this study highlights the effectiveness of the proposed architecture in enhancing car-sharing demand prediction in urban environments.

The Contribution of Influencing Factors in Car-Sharing
Figures 5 and 6 show the results of the negative binomial regression and provide insightful information on factors influencing car-sharing demand, including the number of rented cars, workday, temperature, and air quality.These factors play an important role in determining car-sharing usage.Furthermore, the evaluation results highlight the most influential Points of Interest alongside those with relatively minor impacts on car-sharing demand.Notably, tourist attractions, educational institutions, medical facilities, hotels, and shopping centers emerge as the most influential, while beauty centers, cultural landmarks, and government agencies exhibit less influence.

The Contribution of Influencing Factors in Car-Sharing
Figures 5 and 6 show the results of the negative binomial regression and provi insightful information on factors influencing car-sharing demand, including the numb of rented cars, workday, temperature, and air quality.These factors play an important r in determining car-sharing usage.Furthermore, the evaluation results highlight the m influential Points of Interest alongside those with relatively minor impacts on car-shari demand.Notably, tourist attractions, educational institutions, medical facilities, hote and shopping centers emerge as the most influential, while beauty centers, cultural lan marks, and government agencies exhibit less influence.

Conclusions
This research study has introduced the Unified Spatio-Temporal Inference Predicti Network (USTIN), an advanced architecture for predicting car usage across differe parking lots.The proposed model integrates temporal, spatial, and spatio-temporal un and has demonstrated strong predictive effectiveness, outperforming other state-of-th art models on real-world data.Notably, the temporal module adeptly captured bo short-and long-term temporal demands, while the spatial module incorporates pointsinterest, enriching the contextual understanding of car usage.Additionally, the spat temporal module integrates meteorological data to effectively capture their influen across locations and time.Beyond car demand prediction, we used negative binomial gression with uncertainty to identify the key factors influencing car usage.The obtain results identified key drivers such as tourist destinations, hotels, and shopping centers

Conclusions
This research study has introduced the Unified Spatio-Temporal Inference Prediction Network (USTIN), an advanced architecture for predicting car usage across different parking lots.The proposed model integrates temporal, spatial, and spatio-temporal units and has demonstrated strong predictive effectiveness, outperforming other state-of-the-art models on real-world data.Notably, the temporal module adeptly captured both shortand long-term temporal demands, while the spatial module incorporates points-of-interest, enriching the contextual understanding of car usage.Additionally, the spatio-temporal module integrates meteorological data to effectively capture their influence across locations and time.Beyond car demand prediction, we used negative binomial regression with

( 3 )
Clustering: The parking stations were organized into four distinct classes using frequency-based clustering [21]: • Class A: daily rented cars.• Class B: frequently used cars.• Class C: sometimes used cars.• Class D: unlike other parking stations, cars of this class are rarely used.Classes A, B, C, and D have different parking stations IDs, such as 16, 104, 6, and 25. (4) Splitting the Dataset: We split the data between training and test sets.The training set starts from 1 January 2017 to 31 December 2018, and the test set from 1 January 2019 to 31 January 2019.

( 4 )
Class D analysis.The influential POIs in Table7-Class D are different from those in other classes.Medical facilities (β 16 = 0.0285, p-value = 0.045) are an important factor in driving carsharing demand, while the other factors show less impact.

Figure 3 .
Figure 3.Comparison of the predicted value and the real value using USTIN-full data.

Figure 3 .Figure 4 .
Figure 3.Comparison of the predicted value and the real value using USTIN-full data.Sensors 2024, 24, 1266 16 of 19

Figure 4 .
Figure 4. Comparison of the predicted value and the real value using USTIN-clustered data.

Figure 5 .
Figure 5. Influence of indicators on the car-sharing demand-full data.

Figure 5 .Figure 6 .
Figure 5. Influence of indicators on the car-sharing demand-full data.

Figure 6 .
Figure 6.Influence of indicators on the car-sharing demand-clustered data.
where i t , f t , o t , g t : input, forget, output, and candidate cell state vectors, respectively.W ii , W if , W io , W ig : weight matrices for input gate, forget gate, output gate, and candidate cell state, respectively.W hi , W hf , W ho , W hg : weight matrices for input gate, forget gate, output gate, and candidate cell state, respectively, associated with the previous hidden state.
b ii , b i f , b io , b ig : bias terms for input gate, forget gate, output gate, and candidate cell state, respectively.c t : the cell state at time t.c t−1 : the cell state from the previous time step.H t : the hidden state at time t.σ: sigmoid activation function.

Table 1 .
Influencing indicator system of the potential demand for car-sharing.

Table 4 .
Evaluation results-full data.

Table 6 .
Evaluation of negative binomial regression model-full data.

Table 7 .
Evaluation of negative binomial regression model-clustered data.