Unconstrained Estimation of Multitype Car Rental Demand

: The unconstrained demand forecast for car rentals has become a difficult problem for revenue management due to the need to cope with a variety of rental vehicles, the strong subjective desires and requests of customers, and the high probability of upgrading and downgrading circum-stances. The unconstrained demand forecast mainly includes repairing of constrained historical demand and forecasting of future demand. In this work, a new methodology is developed based on multiple discrete choice models to obtain customer choice preference probabilities and improve a known spill model, including a repair process of the unconstrained demand. In addition, the linear Holt–Winters model and the nonlinear backpropagation neural network are combined to predict future demand and avoid excessive errors caused by a single method. In a case study, we take ad-vantage of a stated preference and a revealed preference survey and use the variable precision rough set to obtain factors and weights that affect customer choices. In this case study and based on a numerical example, three forecasting methods are compared to determine the car rental demand of the next time cycle. The comparison with real demand verifies the feasibility and effectiveness of the hybrid forecasting model with a resulting average error of only 3.06%.


Introduction
The car rental industry plays a huge role globally. It can act as a lubricant for production and consumption to ease their mutual restraints. It can also expand the automotive consumer market and evaluate the popularity of new cars before they come into the market. At the same time, because the public transportation system is limited by operating time, departure frequency, accessible range, comfort, privacy, and other conditions, car rental, with its outstanding advantages such as high flexibility, ease of use, and privacy, has shown considerable growth over the years. The global car rental market was valued at $79.5 bn in 2018 and is expected to reach $125.4 bn by 2025, registering a CAGR of 5.1% from 2019 to 2025 [1]. In recent years, the demand for car rental services in the Asia-Pacific region has become the largest and fastest growing segment, especially in China and India, where there are high population density and rapidly growing demand.
Mainstream car rental companies started to use revenue management (RM) as a crucial tool for reducing operating costs and improving competitiveness in the 1990s. RM is an important interface between enterprises' supply and market demand, and its objective is to maximize profits. The most widely known RM application occurs in the airline industry. It mainly includes four aspects, namely forecasting, overbooking, capacity control, and pricing [2]. Among them, forecasting is the foundation and provides vital input information for other work. A 12.5-25% reduction of forecast error could translate into 1-3% incremental revenue generated from the RM system [3,4]. Therefore, to achieve the desired level of accuracy, it is important and necessary to innovate the forecast method constantly [5]. In reality, influenced by booking and/or capacity limits, the customer's present demand may not be satisfied, which will lead to it being lost (spilled) or recaptured by a more expensive (buyup) or cheaper (buydown) available product. The sample booking curves with three grades of products (the fare order is A > B > C) are shown in Figure 1 [6,7]. Therefore, in order to repair the constrained demand, it is necessary to carry out an unconstrained estimation of the customer's actual demand and forecast the future demand on this basis [8,9]. Research into unconstrained estimation (sometimes also called detruncation, uncensoring) in the RM system started more than 40 years ago and mainly focused on the airline industry. Unconstrained estimation mainly uses observed "censored demand" to estimate and calculate the "uncensored demand". The original methods were based on statistics, such as the expectation maximization (EM), projection detruncation (PD), booking curve (BC), and mean imputation. Since the beginning of this century, discrete customer choice models have grown in popularity to estimate unconstrained demand [8,10,11]. Forecasting usually yields the demands and no-show rate by booking class, the latter of which will not be discussed here. The common methods include time series, linear regression, exponential smoothing, double exponential smoothing, pickup, neural networks, principal component analysis, and adaptive models [6,12,13]. Since air transportation can meet the travel demand between two points, the prediction of OD demand is necessary [14][15][16]. Guo surveyed the history of research on unconstraining methods by reviewing over 130 references [17]. Weatherford provided two reviews of unconstrained estimation and forecast methods broadly used in different industries up to 2016 [18,19]. Recent discussions include forecast multipliers and hybrid forecasting [9], the effect of customers' reference price on demand [20], Gaussian processes for unconstraining demand [21], and demand forecast accuracy [22]. Due to the important research and application value of revenue management theory, many advances have been made in the hotel, car or truck rental, cargo/freight, internet service and retailing, cruise line, rail, and other industries regarding demand estimation and forecasting [19].
Unlike the airline industry, the car rental industry is more concerned with the demand at each rental station, and OD forecasting does not need to be considered as much. Supply and demand are dynamically balanced, but there is still an overestimation or underestimation. Since demand determines the size of the leasing station inventory and fleet, demand estimation and forecasting techniques are worth studying [23]. The first reports of RM application to car rental were at Hertz and National Car Rental [24,25]. Jishan used a decomposition method to identify the actual latent demand from the total recorded turndowns [26]. Gordon simplified the problem by human-computer interaction, hoping to find the right level of detail at which to forecast and optimize [27]. Dhruval optimized the fleet with the help of a robust design methodology, considering fleet as a product and cars as an affecting variable [28]. Kourentzes addressed the frequently encountered situation of observing only a few sales events at the individual product level and proposed variants of small demand forecasting methods to be used for unconstraining [29]. In general, there is relatively little research on unconstrained estimation and prediction of car rental demand. By referring to research methods in other fields, relevant research in the car rental industry is worthy of a further discussion.
Due to the relative lack of articles that deal with car rental demand forecasting, this work develops an optimization methodology to estimate and forecast the demand for a car rental station; it is then applied to a case study in China. By using the case study, the work also enables comparison between the effects of different methods, especially for cases of small demand. The remainder of this work is organized as follows. Section 2 presents the problem statement and hypothesis, including an improved spill model for unconstrained estimation. In Section 3, we propose a hybrid forecasting method based on the Holt-Winters model and backpropagation neural network. Subsequently, Section 4 introduces the case study with numerical experiments and a discussion. Section 5 concludes the work.

Hypothesis and Variable Definitions
Before studying the forecasting method, the conditional hypothesis and variable definitions should be explained. This work is mainly aimed at the car rental industry, and the method is also applicable to the hotel and aviation industries, etc.

Unconstrained Estimation Methods
For the revenue management system, the starting point of analysis is some set of assumptions regarding an underlying stochastic or deterministic demand process [30]. In fact, when the real customer demand exceeds the preset order quantity limit, the recorded demand data in the presale system would be censored. Using these "cutoff demand data" for demand forecasting is bound to reduce the accuracy of forecasts and directly impact the revenue. Therefore, deducing the real historical customer demand from the recorded demand data is the first problem. Some common unconstrained estimation methods are as follows.

Expectation Maximization
EM is one of the most common methods to restore incomplete data by using the iterative algorithm in statistics. The EM algorithm is mainly composed of two steps, E (expectation) and M (maximization)-that is, after initializing the parameters to be estimated through the observable incomplete data information, the conditional expectation of missing information in the incomplete data is calculated by parameter estimation, and the missing information is replaced accordingly. In step E, an expected log-likelihood function of complete data is obtained. In step M, the improved parameter values are obtained by using this function through the maximum likelihood estimation method. The whole process will be repeated until convergence [31].
Convergence condition: the operation should be stopped when where  is any small given number. Otherwise, go back to step E and continue iterating.

Projection Detruncation
The initial values, convergence time, and iteration number of the EM algorithm in the iterative process are more susceptible to their own data in varying degrees. Based on the EM algorithm, Hopperstad proposed PD, which solves the limitation of EM in a largescale unconstrained estimation [32,33]. The research proves that the PD algorithm is more flexible in practical applications.
PD and EM have the same principle, both of which include step E and M. Both algorithms use the iterative idea of statistics to estimate the parameters of constraint demand data, but the difference is that the parameter-related heuristic method is used in PD at step E. In PD, the unconstrained estimation value is gained without calculating the conditional expectation of the constraint data. In step M, the convergence conditions are the same, namely is the distribution function of normal distribution, and parameter  is any given constant ( 0 1    ) and stays the same throughout the whole process.  represents the probability that the observed constrained demand has a true value greater . Instead of calculating a conditional expectation, in the probability distribution graph (Figure 2), PD balances two things. The first is the area of the probability distribution between the order quantity limit (that is, the original constrained value) and the new estimate or projected value (area M), and the other is the area between the new estimate and infinity (area N). The horizontal axis value corresponding to region N represents the portion of the real demand that is underestimated by the unconstrained estimation [34]. When

Multitype Spill Model
The multitype spill model (MSM) takes into account the vertical recapture of different price grade requirements, where the buyer's behaviors of buyup and buydown indwell in different price levels. Therefore, the unconstrained estimation process based on MSM is closer to reality, which can effectively avoid the overestimation of real demand due to the repeated records of the revenue management system's actual observable demand between different car type prices. The spill model is mainly divided into two parts, the first part being the calculation of spillage, and the second part being the calculation of unconstrained demand [35]. The specific calculation process is as follows: 1. Check if the rental demand is constrained in t  . When ( , , ) 1 I i j t  , the rental demand is not constrained and no repair is performed; otherwise, the repairing process is performed, go to next step.
3. Calculation of spillage and cumulative spillage. According to the assumptions, the customer's willingness to pay is characterized from low to high car type, so the spillage should be calculated from the lowest car type.
 For a single car type, cumulative spillage can be determined by the following formula: is the distribution function of standard normal distribution, and ( ) x  is the distribution function of standard normal distribution [36].


For a multitype situation, when 1 i  , considering cumulative spillage ( , , 1) SU I j t  at review point 1 t  from car type 1 to I .

Improved Multitype Spill Model
Generally, traditional revenue management assumes that customer selection behavior is short sighted and ignores customer subjectivity. However, competition in the car rental market has intensified, and the "buyer market" has gradually expanded, so the subjective initiative of customers has been more fully reflected. The customer can choose target commodity according to his or her own preferences and subjective utility. These factors can be divided into three categories according to the differences in sources, namely accidental factors, objective factors, and subjective factors.
Accidental factors are not expected and have some randomness, such as concerts, sports events, and weather changes. Objective factors are external factors, which are generally not changeable in a short period of time. Objective factors are not directly related to the customers themselves, including policy and regulations, regional economic level, and rental vehicle attributes. Subjective factors are the customer's own determinants, which are expressed in terms of customer preferences, trip purposes, age, and literacy. Subjective factors can be grasped by the analysis of the customer's choice intention, usually in the form of a questionnaire.

SP/RP Survey
The basic data of customer selection preference were obtained by stated preference (SP) and revealed preference (RP) surveys. The SP survey preresearches the subjective preferences of customers when renting a car before the rental occurs. The RP survey analyzes the actual selection behavior of customers in the context of what has happened. SP data were more flexible, while RP data were more reliable.
The SP survey can obtain multiple data from a single interviewee, with the advantages of small sample and low survey cost. The basic principle of the SP survey is to predetermine the various attribute factors and their influence levels and then invite interviewees to score the plan in the set situation. The respondents' choices may be different from their actual actions, but they can still find the main factors that can be calculated and have a large impact on the overall mean. Then, the secondary factors that are difficult to measure will be removed. The data obtained from the RP survey are the actual occurrence data, since the data records behaviors that have been taken by the respondent or the actual selection behavior observed. The basic principle of the RP survey is to describe the occurrence or existence of the situation with different attribute factors and influence levels and to invite the respondents to score the content.
Before the SP survey plan is designed, condition attributes, decision attributes, and corresponding level values should be determined. The survey content includes the importance of the brand effect, the satisfaction of car rental type, the sensitivity of the rental price, returning location, satisfaction of the vehicle status, and the satisfaction with a special offer, service attitude, procedure convenience, and accident handling. The RP survey is aimed at customers who have made choices, and the influencing factors are rated as very satisfied, satisfied, general, not satisfied, and very dissatisfied. The survey includes the type of rental vehicle and satisfaction with the rental price, return location restrictions, vehicle status, service attitude, procedure convenience, and accident handling when the preferred rental vehicle is rejected. See Appendixes A and B for the SP/RP questionnaire of the case study.
Customers are affected by many attributes when selecting multiple models. In order to determine the degree of influence of each attribute, AHP is usually used. AHP (analytic hierarchy process) is an easy way to quantitatively and qualitatively deal with some fuzzy and complex subjective problems. The most critical step is to obtain a judgment matrix by pairwise comparison to calculate the weight of each factor. The specific calculation process is not given. However, due to the uncertainty of expert experience, the subjective constructed judgment matrix may not meet the conditions of complete consistency. In order to reduce the inconsistency of subjective judgment, the VPRS (variable precision rough set) can be used to simplify the factors affecting the customer's choice behavior, and similar factors are described by several factors to construct the judgment matrix.
VPRS expands the Pawlak rough set by introducing a confidence level   0.5 1     , so that admissible classification error is allowed to a certain extent. It can improve the flexibility of decision rules and, at the same time, reflect the correlation in data analysis, which is beneficial to the discovery of relevant data from unrelated data, that is, the implicit patterns in the data can be more clearly expressed [37].
finite nonempty set called the domain object space.
is a finite nonempty subset of attributes. If the attribute in A can be further divided into two disjoint nonempty subsets, namely the conditional attribute set C  Ø and decision Then, the classification accuracy can be defined as measures the percentage of knowledge that can be correctly classified in the existing knowledge given a certain  in the domain.

Intention Analysis
Nine factors affecting the customer's car rental behavior are regarded as conditional attributes by the Delphi method (Table 1), and the level values of the conditional attribute are defined by three grades: 1 = satisfaction, 2 = something in between, and 3 = dissatisfaction. The level values of the decision attribute can also be defined by three grades: R = rent, B = book but not necessarily rent, and G = give up. Interviewees were randomly selected near each site, and the investigators cooperated to fill out the questionnaire. A total of 50 questionnaires were distributed and 40 questionnaires were returned, of which 30 were valid. By sorting out and analyzing the survey data, customer selection decision information can be obtained, as shown in Table  2. Table 2. Customer selection decision information.

Condition Attribute
Decision Attribute The following equivalence classes are available: { , , , } u u u u , 8 11 { } X u  , 9 13 { } X u  Obviously, the judgment matrix  is a positive and negative matrix, which fully satisfies the consistency condition, and its maximum eigenvalue is

Customer Choice Probability
The multitype spill model is based on traditional revenue management. The assumption is that people in reality are irrational. When the product price is lower than the willingness to pay, the customer will buy it immediately. Obviously, this assumption ignores the initiative of the decision-making subject. In reality, individuals have various cognitive deviations. If we do not proceed from the customer's choice behavior, we cannot find the customer's car rental rules and the changing trend of demand, which is very unfavorable to the car rental company's ability to set prices and inventory in the later stage.
Therefore, in order to describe the customer's choice behavior, the Logit model is used to solve the above problems. The utility maximization theory is assumed, and the utility size is used to measure the probability that a customer selects a certain type of vehicle. The various attributes of different types constitute a comprehensive utility value. It is generally believed that the greater the utility of a customer to select a particular type of vehicle, the greater the probability that such a vehicle will be selected. Customers choose a variety of types and make a choice after evaluating the utility based on various attributes. The customer's choice behavior can be described by a utility function  are the average utility and the random utility error, respectively, corresponding to the customer's selecting.
Assuming that the customer is rational, then each customer will choose the product that will maximize utility. The probability that the customer chooses car type i is Assuming that i  are mutually independent and obey Gumbel distribution, then the probability variable deviations of the two independent Gumbel distributions are subject to the Gumbel distribution, which yields the general form of the multinomial logit model: Since the RP survey is based on actual conditions, the survey respondents are customers who return cars to the rental sites. In the design of the questionnaire, the actual upgrade or downgrade has been fully reflected. Therefore, the corresponding utility when the customer's rental car type is upgraded or downgraded is where , P i h is the preference probability of buyup or buydown with car type h when the customer's choice of car type i is rejected, and ,0 P i is the leaving probability when car type i is rejected.
The improved multitype spill model (IMSM) needs to calculate the complete spillage of all constrained data first and then calculate the demand transferring from other car types which are constrained according to the customer preference probability. Excluding this part is the spillage; then, the sum of cumulative order quantity and spillage will be the unconstrained estimation.
Excluding the transferring is the spillage:

U i j t CP i j t CP j t P CP j t P CP I j t P
Then, the unconstrained estimation is

Selection of Forecast Method
For demand forecasting, there are three methods: quantitative analysis, qualitative analysis, and decision analysis. Quantitative analysis relies on a large amount of historical data and can be divided into time series forecasting, which uses time to organize data, and the causal analysis method, which uses relationships to organize data. The time series method is the most commonly used, in which the Holt-Winters model and moving average method are representative. Time series forecasting can predict future demand, but it cannot explain the reason. Causal analysis commonly uses regression analysis and simulation methods. The causal analysis method uses the causal relationship between data to look for changes and is generally used for macro prediction.
Qualitative analysis mainly relies on expert knowledge and experience to evaluate and does not involve quantitative analysis. Delphi and judgments fall into this category. Although this kind of method operates simply, the subjectivity is too strong and the effect is also not favorable. The decision analysis method combines quantitative and qualitative methods. At present, the market survey and randomness method are more commonly used, but the research is not very mature. The quantitative analysis method is commonly used to compare the above three methods.
The Holt-Winters model is a kind of time series forecasting model that avoids the deficiencies of the moving average method. It uses a cubic smoothing equation to make different data have different weights, and the predicted value is the weighted sum of the previous data sequence. Larry Weatherford et al. deemed the availability of the basic neural network more useful than traditional forecasting methods (moving averages, exponential smoothing, linear regression, etc.) by comparing the mean absolute percentage error [38].
The backpropagation (BP) neural network is a widely used nonlinear forecast method that can simulate the neural structure of the human brain and solve more complicated problems. The BP neural network can use its own nonlinear characteristics to simulate the development trend of the data, without requiring an assumption function. When the prediction accuracy is reached, the future demand can be predicted according to the learning situation. However, the BP neural network relies on the initial conditions and is prone to fall into the local optimal solution. Therefore, the single prediction method has inevitable defects, and the error accuracy may not reach the conditions for actual use. This paper intends to use the hybrid forecasting to combine weights of multiple forecast methods and improve the forecasting accuracy.

Holt-Winters Model
The Holt-Winters forecasting model is observed to outperform other techniques for the time series, having changing seasonality, mean, and growth rate [39]. It is an adaptive model that automatically recognizes changes in data patterns. For example, if the deviation is caused by internal interference, it can be considered that the new observation has the same influence as the original data, and thus gives the same weight to the data of different periods. If the deviation is caused by external interference, then new observations and the original data have different influences on the prediction results, and the new observations have a higher impact on the prediction events. In order to show that the value of the data in different periods has a different influence on the forecasting results, different weights can be given to different periods.
The demand for car rental is a nonstationary time series with seasonal and cyclical trends. The Holt-Winters model can perform very accurate forecasting of this regular time series data, especially for trends and seasonal changes. It can decompose linear time series, seasonal variations, and random variation time series and properly filter the impact of random fluctuations. The Holt-Winters model is an improvement and development of the moving average method. It does not need to store much historical data but also considers the importance of each period of data and uses all historical data.
First, smoothing equations can be obtained by iteration: where , Then, the predicted value at the future time point k is The way to determine the three smoothing coefficients  ,  , and  is to minimize the error between the forecasted and actual values. In order to obtain more accurate and objective parameters, the traditional method is the residual square sum minimum method. Smoothing coefficients are all located in the interval (0, 1) and increase with 0.1 step length. The squares of the prediction residuals are calculated separately and summed until the smoothing coefficients corresponding to the sum of squares of the smallest residuals are found.

BP Neural Network
The neural network generally includes an input layer, an output layer, and a hidden layer. The input layer is located in the first, and no neurons are connected at the front end. The output layer and the hidden layer are all connected with neurons, and the weights have a one-to-one correspondence. The impact factors of output data include input, weight, threshold, and excitation function. Because of the large number of neurons, a large amount of information is stored, giving the neural network powerful data processing capabilities. The neural network has the advantages of a strong parallel processing ability, strong nonlinear processing ability, strong self-adaptation and learning ability, and strong associative memory and fault tolerance.
The BP neural network has a simple structure, strong plasticity, clear learning steps, and mathematical meaning. It has been proved that the BP neural network can simulate any complex nonlinear mapping by selecting three layers. The BP neural network is a feedforward type network that utilizes an algorithm of error back propagation. There are only feedforward associations between neurons, and no feedback, intralayer, or interlayer correlation. Linear sigmoid-type functions are generally used as excitation functions. Since the excitation function is measurable everywhere, for a BP network, the divided area is no longer a linear partition but an area composed of a nonlinear hyperplane, which is a relatively smooth surface. Compared with linear partitioning, this classification is more accurate and has greater fault tolerance. The learning method of the BP neural network is to strictly adopt the gradient descent method, so that the analytical formula of weight correction is also very clear.

Network Structure
A full connection is achieved between the upper and lower layers of the BP neural network, and there is no connection among each neuron. Studies have shown that when the output layer and input layer use a linear activation function and the hidden layer uses the sigmoid activation function, a BP neural network with a hidden layer can map all continuous functions. Therefore, when constructing a BP neural network model, only one hidden layer is generally used, as shown in Figure 3.

Learning Process
The learning process of the BP neural network consists of two parts: forward and reverse propagation. When the propagation is positive, the sample is delivered to the input layer and then processed by the hidden layer and passed to the output layer. If the value obtained by the output layer does not satisfy the expectation, then it enters the backpropagation link, which uses the error back to the input layer and continuously corrects the weight between layers during the transfer process. After repeated propagation, when the error is small enough to be acceptable, the learning process then stops (shown in Figure 4). The specific learning steps are as follows: Step 1. Program initialization. Select sigmoid as the activation function, then determine the minimum error, learning rate, and momentum coefficient.
Step 2. Calculate output. Input the initial weight and calculate the output values of the hidden layer and the output layer processing unit.
Step 3. Calculate the error value. When error is less than the given minimum error, go to Step 5; otherwise, go to Step 4. Step 4. Backpropagation to adjust the weight between hidden layers; then reuse Step 2.
Step 5. Acquire the optimal output value and the weight of each layer and end the algorithm.

Sample Selection
After the model is established, the sample needs to be trained. In general, the training sample needs to meet four characteristics. (1) There is a close functional relationship between the input and object variables, and the object variable will change obviously due to the change of input variables. (2) Input variables are independent of each other. It is impossible to accurately calculate other components by using the components of input variables. (3) The data to be predicted has a certain commonality with the sample data. (4) Sample size should have a certain scale, so that the combination of all samples can reflect the mapping relationship between output variables and target variables.
The BP neural network applies the data-driven idea, that is, using a nonlinear characteristic to approximate a time series, and then using the clear logical relationship and historical data to express future values. Suppose there is a time series { } p q p q p q r X X X

Weight Determination
Hybrid forecasting is to assign several kinds of single prediction methods to different weights to form a comprehensive forecasting model. It can accurately and reasonably use the valuable information of a single forecasting model, better adapt to future changes, and reduce forecasting risk.
The key to hybrid forecasting is to properly determine the weight of various forecasting methods, and reasonable weights will improve the prediction accuracy greatly. Common weight determination methods include the arithmetic average method, variance reciprocal method, mean square reciprocal method, simple weighting method, and linear programming method. Among them, the arithmetic averaging method is suitable for treating each individual model equally if it is not known to each model. Usually, it is not optimal, and the sum of squared errors is large. The variance reciprocal method, mean square reciprocal method, and simple weighting method all have higher prediction accuracy than the arithmetic average method, but it is necessary to have a certain understanding of the prediction requirements in advance. All three methods above have one thing in common, that is, the variance is used to calculate the weight. The variance is the degree of fluctuation of the response variable above and below the mean. When the prediction result is not ideal and the error fluctuation is not large, the weight will still be large, which is unreasonable. Therefore, this paper adopts the linear programming method, which determines the optimal weighting coefficient by taking the minimum absolute value of the combined prediction value as the objective function.

Data Collection
This paper collected the operating data of the Huilongwan site of China Auto Rental Inc. in Chongqing for 4 consecutive weeks (28 days). Each set of data includes the date of pick up, vehicle brand, car rental price, and pick-up location. According to the previous method, the model is divided into three grades according to the rental price and is set as a "presale lead time interval" every week. The survey data is sorted out to obtain three "presale lead time intervals" for each price grade model. The observable order quantity q and the presale opening status of each grade model are shown in Table 4. Table 4. Car rental order data. (1, , ) IB j t

Applying IMSM
According to the rental price, the car type is classified as 1 under $40 per day, 2 within $40-70 per day, and 3 above $70 per day. Due to cost and time constraints, a random sampling method was used to conduct a questionnaire survey at some rental sites of China Auto Rental Inc. in Chongqing (see Figure 5). The sample size of this RP survey was 400, and 363 questionnaires were collected, of which 320 were valid and 220 of which included buyup or buydown. Discretize the survey data, and let "very satisfied = 10", "satisfied = 8", "general = 5", "not very satisfied = 3", and "not satisfied = 0". The influencing factor score, utilities, and possibilities corresponding to various choice behaviors can be obtained (Table 5).  , set in EM and PD (see Table 6). Numbers in brackets are the errors of the estimated value relative to the real demands.
According to the results listed in the above table, when demand repair is not carried out, the average gap between constrained demand and actual demand is large, which is 10.1%. After various demand estimation methods are adopted, the average error of the obtained demand data is small, among which the average error of IMSM is the smallest, 3.06%.

Demand Estimation
Combined with the survey data, the abovementioned various algorithms are used to estimate the unconstrained demand of multiple car types. For EM and PD, let  Table 7 for different vehicle types and cycles.  When making predictions, the first step is to calibrate the model. Due to the large amount of data, only the fitting process of car type 1 will be described in detail. Considering the characteristics of the Holt-Winters model, the initial value of cubic exponential smoothing is selected from the first value (1, ,1) IB j . Therefore, these three sets of data are eliminated, and the remaining 28 sets of data are fitted. According to the minimum sum of squared residuals, the coefficients of  ,  , and  are determined to be 0.3, 0.9, and 0.1, respectively. The two curves obtained coincide with each other, which indicates that the exponential smoothing effect is good. The forecasting model is generally considered to be available when the error is within 10% (see Figure 6). The demands of different car types can be acquired. The specific prediction results are shown in Table 8.

Method
Car Type layer activation function uses the sigmoid function. The third layer is the output layer, with the demand as the output element, a total of one neuron, and the activation function uses a linear activation function. After the BP neural network model is established, samples need to be trained. In general, the training sample needs to meet four characteristics: (1) there is a close functional relationship between the input and object variable, and the latter will change significantly due to the change of the former; (2) input variables are independent of each other, and other components cannot be accurately calculated using the components of the input variables; (3) the data to be predicted has a certain commonality with the sample data; (4) the sample capacity should have a certain scale, so that all the samples can reflect the mapping relationship between output variables and target variables.
In order to train and test the neural network model more reasonably, the unconstrained historical data repaired by EM is divided into 42 groups, among which seven groups of prediction samples and 35 groups of training samples are randomly selected by scrolling to select data. The training samples are used to train the neural network model, while the prediction samples are only used to test the feasibility and reliability of the forecasting model. The specific grouping situation is shown in Table 9. Using prediction samples and training samples, the calculated values of forecasting models are checked separately. The neural network forecasting model has a high degree of fitness for the training samples, and the average error is 0.066 (see Figure 7). The model is reasonable and can be applied to later predictions. Demands of different car types can be obtained, which are shown in Table 10.

Method
Car Type The combination weight coefficients are selected by a linear programming method. When the weight is 0.7 for the Holt-Winters model and 0.3 for the BP neural network, the sum of the squared errors is the smallest. Using the hybrid forecasting model, the demands of different car types can be obtained, as shown in Table 11. Hybrid forecasting can neutralize both models, making the predictions more accurate, increasing the smoothness, and forecasting accuracy, so it is presented below. Table 11. Hybrid forecasting results based on different estimation methods.

Method
Car Type From the real demand and the prediction results (as shown in Tables 12 and 13), the following can be seen.
For different car types, from the perspective of the predicted effect, comparing the Holt-Winters model with the BP neural network model, the former has a greater volatility, while the latter is more stable. From the perspective of the adaptability of the unconstrained estimation model, when the unconstrained estimation data obtained by MSM is used, the prediction results of the three methods are relatively stable, and the hybrid forecasting model can obtain the best results. The relative errors of the three models are respectively: 1.45%, 8.66%, and 5.00% (see data in the box of Table 13).
For the forecasted total, the Holt-Winters model is better than the BP neural network model in predicting the effect. From the perspective of the adaptability of the unconstrained estimation model, the Holt-Winters model has the best prediction effect when using MSM to repair the data, while the three methods can all obtain better prediction results when using IMSM, among which hybrid model has the smallest error.
Considered comprehensively, when using MSM for unconstraining repair data, the relative error of the Holt-Winters model for predicting total demand is 3.06%. However, looking at the corresponding subtype prediction results, we find that the relative error of the third type is 32.5%. When IMSM is used for unconstrained repair, the relative error of the total demand using the hybrid model is 4.76%. The relative errors of the predicted results for the three models are 1.45%, 8.66%, and 5.00%. Therefore, by using the IMSM to estimate historical constraint requirements, the hybrid forecasting is optimal for future demand prediction.

Conclusions
To fully use the information of historical car rental data, a two-stage joint approach was proposed for predicting multitype car rental demand. The method considers the customer's choice behavior to improve the spill model, which can estimate the unconstrained demand effectively and uses the hybrid forecasting model to predict future short-term demand.
For historical car rental demand data, the repaired data can more accurately reflect customer demand. In order to prove the effectiveness of the method, EM, PD, and MSM were compared in the unconstrained demand estimation stage, and the Holt-Winters, BP neural network, and hybrid models were used for comparison in the future demand prediction stage. For different car types, different models can form different prediction results in two stages. The comprehensive calculation results show that the proposed method is superior to other combinations in terms of stability and effectiveness of prediction.
Specifically, using the case study, for the repair of the demand data of each car type, IMSM has obvious advantages compared with the other three methods, and the average error of the example test is 3.06%. For the prediction of future demand of each car type, based on the calculation results of different unconstrained estimation models, the performances of the Holt-Winters model, the BP neural network model, and the hybrid model are quite different. The hybrid forecasting model has the best effect, and the relative error of the predicted results is within the acceptable range for different car types or overall needs. Therefore, the case study shows that the proposed method outperforms other methods. It can be regarded as a new way to help car rental companies to predict customer demand. There are many factors affecting customers' car rental behavior, and with the application of mobile apps, the way of car reservation is changing. In the future, more influencing factors, such as electric vehicles, urban traffic restriction policies, and road traffic conditions, should be taken into account. In addition, the different distribution characteristics of car rental demand and the combined optimization of forecasting methods will also be interesting research topics.
Funding: This research was funded by Humanities and Social Sciences Foundation of Ministry of Education of China (Grant Nos. 17YJCZH220, 17YJA630079).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.