Identifying and Predicting the Expenditure Level Characteristics of Car-Sharing Users Based on the Empirical Data

: Car-sharing plays a positive role in reducing vehicle ownership and greenhouse gas emissions. However, the developmental contradictions between high investment and low revenues hinder the development of the car-sharing industry. Fully understanding car-sharing users can e ﬀ ectively ensure the healthy development of car-sharing companies and promote the development of the entire industry. To this end, this study attempts to develop a user management method that is based on user layering and prediction methods. By using order data from the Lan Zhou car-sharing company in China, this paper develops a clustering method for layering car-sharing users. A multi-layer perceptron model is also developed to categorize these users into di ﬀ erent expenditure level categories while considering periodic features. Results show that new users can be divided into three categories according to their expenditures to car-sharing companies within 84 days. After 5 weeks of observation, the 84-day category of new users can be predicted with an accuracy of over 85%. These results provide scientiﬁc decision support for the user management and proﬁtability of car-sharing companies.


Introduction
By reducing vehicle ownership, car-sharing can contribute to the conservation of resources and alleviation of traffic congestion [1]. In this paper, car-sharing refers to a one-way car-sharing and online time-sharing system mainly supported by mobile technology. A car-sharing company provides resources, such as cars and parking stations, to their users via the Internet. Users can rent a car from a station according to their own travel demands at any time. At the end of their trip, these users need to return their cars to the station and make their payment [2].
Users and companies act as the main participants in car-sharing to ensure the implementation and development of car-sharing travel modes. Car-sharing users can support the normal operation of companies through the consumption of their services. At the same time, companies can improve their service quality to subsequently increase the expenditures of its users. Therefore, mutually reinforcing relationships are formed between user expenditures and company development. However, car-sharing companies face many challenges as the number of their users increases. For example, these companies face the "profit anxiety" problem, which results from their high investment and low revenue during the process of their rapid development. This problem is driven by their inaccurate identification of usage frequency, travel time and travel distance, and their findings have contributed to the present understanding of car-sharing users.
Improving revenue is among the key factors that determine the sustainability of car-sharing services. A sustainable profitability can also help car-sharing companies overcome their profit anxiety. Previous studies have mainly focused on increasing the revenue of companies and reducing their operational costs. For instance, Shaheen et al. [21] analyzed the global car-sharing market, specifically their parking costs, vehicle models, energy costs, and technologies, and proposed a car-sharing business model for different operating environments with an aim to improve the operational efficiency of car-sharing companies. Alvina et al. [22] studied the multi-parking vehicle allocation problem in the car-sharing operation process and built a car-sharing operation decision model on the basis of three-stage optimization theory. Susan et al. [23] proposed a corresponding commercial operation method for the actual traffic status and user travel characteristics in Beijing and proposed some suggestions for car-sharing operators in the city. Ji et al. [24] identified the operational service capabilities, level of platform design, and convenience of car-sharing facilities as the man factors that affect the development of car-sharing companies and then proposed business strategies in consideration of these factors. Sun et al. [25] constructed a user reservation allocation model with operator profit maximization as its optimization goal. Kong et al. [26] developed a car-sharing dynamic pricing scheme with the goal of maximizing the daily revenue of car-sharing operating systems.
Car-sharing is increasingly becoming a mature industry. However, studies on car-sharing remain in their infancy. Some deficiencies can also be observed in the breadth and depth of extant research. The main deficiencies in the existing literature and the improvements that this research aims to contribute to are summarized as follows. First, previous studies have mainly focused on the willingness of users to use car-sharing services but very few have considered the revenue of car-sharing companies in their investigation. To fill this gap, this study investigates car-sharing services from the perspective of car-sharing companies. Specifically, the findings of this research can provide some references for car-sharing companies to manage their user-centric services. Second, previous studies have mainly focused on the operational efficiency of car-sharing companies but only few have explored the management of car-sharing users with an aim to facilitate the development of car-sharing companies. In the field of economics, those users with a higher expenditure are perceived to be more valuable to companies. Customer value has direct effects on the scale of development of companies because these customers act as the sources of revenue for these companies. In other words, valuable users bring a considerable amount of revenue to companies. Therefore, the differences among users of different revenue contribution levels must be investigated.

Data Collection
The data used in this study are collected from the Lan Zhou car-sharing company. The collected datasets include order data from 1 January 2018 to 31 November 2018 and new user registration data from 1 January 2018 to 31 July 2018. New users are defined as those who have registered with a car-sharing company from 1 January 2018 to 31 July 2018. The order and registration datasets cover different periods because in order to obtain the order data of users within an 84-day time span, the time span of the order dataset should be at least 84 days longer than that of the registration dataset. In this case, the collected datasets can provide the order information of each new user from the time they start using the car to the 84-day time span. This information will be applied in the modeling.
The order dataset also contains information on the car-sharing usage behavior of users during their car-sharing experience. Table 1 presents an example of order data used in this study. The user registration data are used to identify the registration characteristics of new users. Table 2 presents an example of registration data used in this research. In this study, new users are defined as those users who have registered with a car-sharing company from 1 January 2018 to 31 July 2018. This paper attempts to study the car-sharing usage characteristics of new users by classification and prediction. Notably, to realize the modeling, the order data that records the information regarding to the new users within 84 days after starting to use the car are analyzed. The reasons for such selection are as follows: ( (1) Reasons for 84-day data selection The 84-day car usage data of new users are studied because the 84-day behavior of these users can help categorize these users into different types. Short-term car rental users, who use these vehicles for vacation or experiential purposes, have a travel time span of much less than 84 days, whereas long-term car rental users have a car rental time span of more than 84 days. Therefore, using an 84-day duration can help approximate the car rental preferences of uses. Depending on their operational needs, setting a longer duration can help companies understand longer-term user characteristics, but doing so requires longer data time dimensions in modeling. In this study, 84 days can meet the basic needs of the prediction model.

( (2) Reasons for new user data selection
The characteristics of new users will be investigated in this study. Car-sharing companies are interested in the revenue contribution types of their new users, which they can determine by using the shortest possible observation period. Accurately predicting the revenue contribution type of users can also help car-sharing companies implement accurate marketing strategies for their users. Therefore, the data for the first 84 days of a new user are used to develop the prediction model.

Data Preprocessing
The employed dataset contains all order data for all users registered before (old users) and after (new users) 1 January 2018. Screening operations are performed as follows to meet the experimental requirements:

( (1) Screen new users
The order data from 1 January 2018 to 31 November 2018 are linked with the registration data from 1 January 2018 to 31 July 2018. The users registered between 1 January 2018 and 31 July 2018 are filtered out from the original data and are placed in an empty dataset labelled dataset 1. This dataset contains the order data information (total of 121,472 data points) of users 84 days after they have started using car-sharing services.
( (2) Calculate the date of new users on the 84th day The first car usage date of all users in dataset 1 after their registration period is then calculated. Afterward, 84 days is added to the calculation results to obtain the 84-day duration for each user from the date they have started to use car-sharing services.
( (3) Screen order information of new users within their first 84 days The data of new users registered between 1 January 2018 and 31 July 2018 and their car usage information records from their first day of using car-sharing services up to the 84th day are stored in dataset 2. This dataset contains 69,332 data points covering 5202 users and 600 cars and is treated as the final dataset.
Despite differences in the car-sharing usage behavior and order quantity of car-sharing users, their car usage is fixed to 84 days as denoted by K in Figure 1. The data of new users registered between 1 January 2018 and 31 July 2018 and their car usage information records from their first day of using car-sharing services up to the 84th day are stored in dataset 2. This dataset contains 69,332 data points covering 5202 users and 600 cars and is treated as the final dataset.
Despite differences in the car-sharing usage behavior and order quantity of car-sharing users, their car usage is fixed to 84 days as denoted by K in Figure 1.

Variable Definition
The specific variables defined in this study are shown in the Table 3. The layering variables, namely, the total amount of expenditure and revenue contribution of users, are defined in Section 3.2.1. The total amount of expenditure is used as an indicator to classify car-sharing users, and the results are used as the dependent variable of the prediction model. Section 3.2.2 defines the carsharing usage behavior variables, which can reflect the car-sharing usage characteristics of users. Some studies have used some of car-sharing usage behavior variables in their analysis, such as average duration, average mileage, time span, car type ratio, and frequency. However, this study introduces new variables, including the heterogeneity of the rental and return stations and the revenue contribution of users. Given that these variables may vary across different users, they are inputted as independent variables into the user prediction model.

Variable Definition
The specific variables defined in this study are shown in the Table 3. The layering variables, namely, the total amount of expenditure and revenue contribution of users, are defined in Section 3.2.1. The total amount of expenditure is used as an indicator to classify car-sharing users, and the results are used as the dependent variable of the prediction model. Section 3.2.2 defines the car-sharing usage behavior variables, which can reflect the car-sharing usage characteristics of users. Some studies have used some of car-sharing usage behavior variables in their analysis, such as average duration, average mileage, time span, car type ratio, and frequency. However, this study introduces new variables, including the heterogeneity of the rental and return stations and the revenue contribution of users. Given that these variables may vary across different users, they are inputted as independent variables into the user prediction model. The total amount of expenditure indicates the total amount of money that users have invested in car-sharing companies as shown in Equation (1). The total amount of expenditure of a user is a direct evaluation indicator of his/her contribution to the revenue of a company: where x i is the total spending amount of user i in 84 days, and M i,day is the amount spent by this user on car-sharing on day.
(2) Revenue contribution Revenue contribution denotes the ratio of the total amount of revenue contribution of each user to the revenue of a car-sharing company as shown in Equation (2): where n is the number of users, and x P i indicates that user i contributes to the revenue of a car-sharing company within 84 days.
Revenue refers to the fee paid by a user to a car-sharing company for renting a car, expenses refer to the fuel, vehicle depreciation, and company operating costs shouldered by a car-sharing company in their process of renting out vehicles, and profit is computed as revenues minus expenses. In general, a user of high-level expenditure also has a high revenue contribution to the car-sharing company. Therefore, the total amount of expenditure of a user is a direct evaluation indicator of his/her contribution to the revenue of a car-sharing company. In other words, the total amount of expenditure of a user is a direct source of revenue for a company. High-level revenue contribution users are treated similarly as those users with a high total amount of expenditure. Given that companies can greatly benefit from having many high-level expenditure users, attracting such users has practical significance for company development and growth.

Car-sharing Usage Characteristics
(1) Rental time Rental time ratio is considered a behavior characteristic of car-sharing users. According to the overall car usage of all users in each time period, a day is divided into seven segments, and the continuous variables are transformed into discrete variables. The time labels and the corresponding real time are shown in Table 4. To contrast the behavior of different users in the same day, the rental time ratio is computed as shown in Equation (3): where x time i is the ratio of the number of cars used by user i in time period to the number of cars used at all times, d time i,day is the total number of times that user i uses the car in time period, d i,day is the total number of times that user i uses the car on day day, and time is the time period, which value ranges from 0 to 6. (2) Car space The heterogeneity of rental and return stations denotes the proportion of each user who picks up and returns a car at different stations in 84 days. This variable is computed by using Equation (4): where x station i is the heterogeneity of rental and return stations, d station i,day is the number of times that user i picks up and returns a car at different stations on day day, and d i,day is the number of times that user i picks up a car on day day.
(3) Car type A total of five car types are available in the collected dataset. The characteristics of each type are shown in Table 5. The similarities and differences in the car selection preferences of users can be investigated by analyzing these types as shown in Equation (5): where x car i is the proportion of the total number of times that user i takes car type to that of the total number of times that user i takes all car types, whereas d car i,day is the number of times that user i takes car type on day day. (1) Average duration and mileage: where x avesc i is the average car duration of user i, sc i,c is the cth car duration of user i, n i is the total number of times that user i uses the car in 84 days, x avelc i is the average car mileage of user i, lc i,c is the cth car mileage of user i, and c is the cth time of user i to rent a car.
(2) Maximum duration and mileage: , and lc i,c are the maximum car duration, cth car duration, maximum car mileage, and cth car mileage of user i, respectively.
(3) Minimum duration and mileage: where x minsc i is the minimum car duration of user i, and x minlc i is his/her minimum car mileage. Time span refers to the time interval between the last and first car rentals in the first 84 days as shown in Equation (12): where x timecha i is the time span of user i, that is, the time difference between his/her last and first car rentals in the first 84 days, t f irst i refers to the first time that user i uses the car, and t last i refers to the last time that user i uses the car.
(2) Frequency Frequency refers to the sum of the number of per user within 84 days and is calculated as: where F i is the total number of times that user i uses the car in 84 days, and d i,day is the number of cars used by user i on day day.

Layering and Prediction Modeling
Two models are built in this study, namely, a layering model of car-sharing users and a prediction model of users. When analyzing users, they are initially layered before their types are predicted according to their characteristics. Given the complexity of user distribution, classifying these users on the basis of experience alone is sometimes impossible. Therefore, the close relationship between users must be quantitatively determined by using a similarity index or mathematical methods following the principle of the clustering method. This study then establishes a clustering model to layer the users according to their expenditures. Afterward, a classification prediction model is developed given the differences in the car usage characteristics of various types of users. This model can predict the revenue contribution type of new users, thereby helping car-sharing companies understand their users in advance and then formulate a user management strategy.

User Layering Modeling Based on Two-Step Clustering
A car-sharing company can use the expenditure amount of users to maintain its normal operating expenses. Therefore, the expenditure amount of users is of great significance to car-sharing companies. One objective of this paper is to layer users according to their expenditure amount. Layering users can help car-sharing companies learn about their user structure and provide a foundation for their effective operational decision making. Unsupervised classification refers to the classification of data without relying on any classification criteria. Given that the user clustering number is unknown, a two-step clustering method is applied to automatically determine the clustering numbers; this approach can also be used for big data clustering due to its unique data storage solution [27].
The CF tree storage method stores only the sufficient statistics related to the distance calculation in the clustering index rather than the original data itself. In this study, each tree node (group), such as tree node j (group j), stores a sufficient statistic of where N j is the number of type j users, S A j is the total amount of expenditures of type j users, (S A j ) 2 is the sum of squares of the total amount of expenditures of type j users, and N B j is the sample size for each category of subtype of type j users and is equal to zero in this study. The sufficient statistic of merge class is where j and s refer to different types of users. The distance between groups can be easily calculated by using these statistics [28]. Afterward, the likelihood distance of users in each group is calculated by using the sufficient statistic. The CF tree is then established via recursive induction according to the distance between various groups. The tree node generated by the CF tree is then treated as the result of user layering.

Multi-Layer Perceptron Model Considering Periodic Features
The multi-layer perceptron model aims to help car-sharing companies accurately identify the revenue contribution of their users. For these companies, the addition of new users can help them expand their size and increase their revenues. However, when the revenue contribution type of new users is unknown, car-sharing companies are unable to implement a user-centered management strategy.
Car-sharing companies can develop effective operational strategies and maintain high-level revenue contribution users by predicting the expenditure level type of new users during the shortest observation period and by ensuring a high prediction accuracy. To this end, the following prediction time variables are introduced as shown in Figure 2: Assessment period: Car-sharing companies want to understand the revenue contribution type of their users. This research defines the period from start of use to the future as the assessment period. The expenditure amounts and behavior characteristic of users during the assessment period are unknown. Identifying the revenue contribution level of users during the assessment period in advance can help companies implement differentiated management strategies for their users and rationally allocate their resources. Therefore, the purpose of this research is to identify the type of users during the assessment period in advance by using the prediction model. The duration of the assessment period defined in this paper is denoted by K.
Observation period: The observation period refers to the period when car-sharing companies observe the behavior of its users. In other words, the observation period represents a short period from the first time the user uses a car to the following. The observation period data for each user are fully known, including his/her car-sharing usage behavior and expenditure amount. The assessment period can be predicted by using the characteristics of the user during the observation period. A longer observation period can help companies understand the car-sharing usage behavior of their users and accurately predict their type during the assessment period. However, a longer observation period also introduces a higher time cost, which is not conducive for companies to understand the preferences of their users as soon as possible. Therefore, a shorter observation period is ideal under the premise of satisfying the prediction accuracy threshold. The duration of the observation period defined in this research is denoted by t.
The multi-layer perceptron model can achieve two goals, namely, determine the duration of the observation period and predict the type of new users without a classification label. The shortest observation period is determined by using a specific accuracy threshold. The modeling process is shown in Figure 3.
The key variables in this model are defined as follows. Let jr g denote the individual r belonging to category j, ( )   Assessment period: Car-sharing companies want to understand the revenue contribution type of their users. This research defines the period from start of use to the future as the assessment period. The expenditure amounts and behavior characteristic of users during the assessment period are unknown. Identifying the revenue contribution level of users during the assessment period in advance can help companies implement differentiated management strategies for their users and rationally allocate their resources. Therefore, the purpose of this research is to identify the type of users during the assessment period in advance by using the prediction model. The duration of the assessment period defined in this paper is denoted by K.
Observation period: The observation period refers to the period when car-sharing companies observe the behavior of its users. In other words, the observation period represents a short period from the first time the user uses a car to the following. The observation period data for each user are fully known, including his/her car-sharing usage behavior and expenditure amount. The assessment period can be predicted by using the characteristics of the user during the observation period. A longer observation period can help companies understand the car-sharing usage behavior of their users and accurately predict their type during the assessment period. However, a longer observation period also introduces a higher time cost, which is not conducive for companies to understand the preferences of their users as soon as possible. Therefore, a shorter observation period is ideal under the premise of satisfying the prediction accuracy threshold. The duration of the observation period defined in this research is denoted by t.
The multi-layer perceptron model can achieve two goals, namely, determine the duration of the observation period and predict the type of new users without a classification label. The shortest observation period is determined by using a specific accuracy threshold. The modeling process is shown in Figure 3.
The key variables in this model are defined as follows. Let g jr denote the individual r belonging to category j, G j = g j,1 , g j,2 , . . . , g j,r represents a set of type j users, R j represents the total number of type j users, and G = G 1 , G 2 , . . . , G J represents a set of individual users of all types. Let x jir (t) denote the ith characteristic variable (including the car usage behavior and static attribute variables) of user r belonging to category j during the observation period duration t. The car usage behavior characteristic variable in this research is related to the duration of the observation period and has nothing to do with the start time of the observation period. The observation period is examined with ∆t as the period; therefore, x jir (t) can be expressed as x jir (m∆t).
Let X jir (m∆t) = x j,1,r (m∆t), x j,2,r (m∆t), . . . , x j,n,r (m∆t) denote the set of characteristic variables of individual r belonging to category j during the observation period duration t = m∆t, where m is the step size, and n is the number of car-sharing usage behavior characteristics.
Assessment period and observation periods are set for each individual r. Each individual is also given a unique observation period duration, ∆t, 2∆t, . . . , M∆t, as shown in Figure 4.      As shown in Figure 5, despite having various starting times, the observation periods of all users have the same duration. The car usage characteristic variable X jir (m∆t) for each user is initially determined, and then the car-sharing usage behavior characteristics in different observation periods are separately examined. The user with the assessment period duration K is then classified and predicted, and the prediction accuracy is eventually determined. This process is described in detail as follows. As shown in Figure 5, despite having various starting times, the observation periods of all users have the same duration. The car usage characteristic variable ( ) jir X m t Δ for each user is initially determined, and then the car-sharing usage behavior characteristics in different observation periods are separately examined. The user with the assessment period duration K is then classified and predicted, and the prediction accuracy is eventually determined. This process is described in detail as follows. First, when the observation period is t t = Δ , the corresponding observation period behavior variable ( ) jir x t Δ is predicted for the user type during the assessment period within duration K. The prediction accuracy ( ) f t Δ is then obtained. Afterward, when the observation period is Given that the prediction accuracy during a specific period increases along with period and time cost, an acceptable prediction accuracy threshold is pre-set. As the observation period periodically increases, a gap between the prediction accuracy and threshold should be observed. If the prediction accuracy is equal to the threshold, then this accuracy can be satisfied during the observation period. Accepting the prediction accuracy also guarantees the shortest observation period. First, when the observation period is t = ∆t, the corresponding observation period behavior variable x jir (∆t) is predicted for the user type during the assessment period within duration K. The prediction accuracy f (∆t) is then obtained. Afterward, when the observation period is t = 2∆t, the corresponding observation period behavior variable x jir (2∆t) is predicted for the user type during the assessment period within duration K. The prediction accuracy f (2∆t) is then obtained. By analogy, when the observation period is t = M∆t, the corresponding observation period behavior variable x jir (M∆t) is predicted for the user type during the assessment period with duration K. The prediction accuracy f (M∆t) is obtained. Find min m∆t f (m∆t) > f * , where f * is the prediction accuracy threshold, and m∆t is the shortest observation period. f (m∆t) denotes the prediction accuracy of the observation period by time m∆t. Given that the prediction accuracy during a specific period increases along with period and time cost, an acceptable prediction accuracy threshold is pre-set. As the observation period periodically increases, a gap between the prediction accuracy and threshold should be observed. If the prediction accuracy is equal to the threshold, then this accuracy can be satisfied during the observation period. Accepting the prediction accuracy also guarantees the shortest observation period.

Layering Modeling of Car-sharing Users
The two-step clustering of car-sharing users can be divided into the pre-clustering and clustering stages.
(1) Pre-clustering Step 1: All users and their corresponding expenditure amount are obtained according to the previous calculation. The sufficient statistic is defined in the Section 3.3.1. Then, a CF tree is built, and all user data are entered at the root node. All users are then grouped into a single class, and their sufficient statistics are stored at the root node.
Step 2: The expenditure amount of each car-sharing user is inputted in turn. The distance between the leaf and existing nodes is calculated, and log likelihood distance is used as the similarity judgment index to select the cluster as shown in Equations (14)- (16). The distance threshold is used to determine whether a new node is formed or merged with an existing node. Therefore, the CF tree is established via recursive induction. ξ where ξ v is the likelihood distance of type v users, v = i, j, <j, s>, <i, j> is the joint class of classes i and j, K A v is the total amount of expenditure of type v users, K B v is the number of categorical variables (which is equal to zero in this study because no categorical variables are used in the clustering model), L k is the number of types under the kth classification, that is, the number of leaf nodes below the k-th node, N k is the number of type k users,σ 2 k is the estimated variance of the total expenditure amount of type k users,σ 2 vk is the estimated variance of the total expenditure amount of type v users in type k,Ê vk is the information entropy of the total expenditure amount of type v users in type k, N vkl is the number of samples of the k-th categorical variable in the l-class in type v, N v is the number of samples in the v-th category, and d(i,j) is the distance between types i and j.
Step 3: According to the data distribution characteristics, the initial distance threshold is set as 0.01. The distance between the newly read data and the existing leaf node is judged according to the initial threshold and number of leaf node samples. The results are used to determine whether the newly inputted data can be regarded as a new node or merged with an existing node. At this stage, the number of clusters ranges from 2 to 15, thereby suggesting that the maximum number of groups is 15 after this stage. The user data are continuously read until all users are assigned to one leaf node.
(2) Clustering stage Assume that N groups are obtained at the pre-clustering stage. At the clustering stage, the log likelihood distance between different groups is initially calculated. Afterward, the two closest groups are merged in turn. One large group is eventually obtained after traversing all sample data and looping operations. The Bayesian information criterion (BIC) is used as the basis in this process. If the BIC reaches its minimum value, then the optimal number of clusters is determined. The BIC calculation method is shown in equations (17) and (18): where J is the category number, N is the number of records in the dataset, ξ j is the difference within class j, m j is the number of classes in class j, K A is the number of continuous variables, K B is the number of categorical variables, and L k is the number of types under the k-th classification, that is, the number of leaf nodes below the k-th node.

Prediction Modeling of Car-sharing Users
(1) Enter the original data In the case of a fixed assessment period, all training and test sets regarding the characteristic variables and user classification labels are entered. The label does not change during the calculation. The characteristics variables related to the observation period duration are also entered at this stage. The assessment period is set as 84 days, whereas the prediction accuracy threshold is set as 85%. The dependent variable for this study is the user category as shown in Equation (19): where X is the independent variable, D is the user car usage behavior variable, M is the expenditure amount of users during the observation period, and S is the static attribute variable of users. The independent variables of this prediction model are shown in Table 6. Gender is the only categorical variable in this model. All other continuous variables are normalized. (2) Initialization of the observation period value duration The observation period is set as 7 days, that is, ∆t = 7. The car usage behavior variables, total expenditure amount, and user static attribute variables obtained in the previous period after all users start using the car are then calculated. These variables are used as arguments in the user prediction.
(3) Multi-layer perception prediction The total data after standardization are obtained at the second stage. Among these data, 70% belong to the training set, whereas the remaining 30% belong to the test set. A multi-layer perceptron classification prediction model is also established at this stage, and the training set is used to build a prediction rule between the user category and the user car-sharing usage variable X jir (∆t). The test set user data are used to calculate the prediction accuracy.
The user characteristic variable with an observation period of 7 days is inputted into the model. The dependent variable is the user revenue contribution category with an assessment period of 84 days to construct the prediction model. This process is described in detail as follows.
A total of 25 independent variables (23 continuous variables and two dummy variables) for prediction are obtained after processing. Therefore, the input layer of the network has 25 nodes. The neurons between the layers are fully connected. For the output layer, the number of nodes is equal to the number of user categories J. Given that this model is a classification prediction problem, the SoftMax function is selected as the activation function for the output layer. The relative error is selected as the error calculation index [29].
To minimize the model prediction error, the training type selects batch processing, the optimization algorithm determines the synaptic weight, and the conjugate gradient method uses the conjugate gradient algorithm to prevent the local optimal solution. The initial Lambda value of the conjugate gradient method, the initial Sigma is the interval offset. The training termination condition is set to a maximum of 20 steps when the error is not reduced [30].

(4) Prediction accuracy judgment and model output
The multi-layer perceptron prediction model measures the prediction accuracy f (∆t) when the duration of the observation period is t = ∆t and when the assessment period is K. If f (∆t) > f * , then the value of output t = ∆t is treated as the optimal observation period of the model. In other words, the duration t = ∆t should be set for classification prediction. If the assessment period duration is K, then the prediction accuracy reaches f (∆t).

(5) Cyclic calculation
If m = 1, then the predicted value within the observation period cannot reach the defined prediction accuracy threshold. Therefore, the observation period duration must be increased. In this study, given that m = m + 1, the duration of the existing observation period becomes t = 2∆t. The procedures as presented in the first to the fourth stages are then repeated. The optimal prediction accuracy value for the observation period duration is obtained again, and the m value is obtained until the shortest observation period under the precision threshold condition is obtained.

(6) Rolling forecasts
User management is a dynamic process. The management strategy for each user changes over time depending on the car usage behavior characteristics of users. Therefore, a car-sharing company should use a car usage behavior dataset for its rolling forecasts. In this study, the expenditure level type within 12 weeks can be determined by observing the behavior characteristics of users in the first 5 weeks. First, if the company wants to determine the type of users after 12 weeks, then the company can use the acquired 12-week car usage data to develop the model again. This rolling forecasting method can help the company adjust its operating strategy in time for the changes in the car usage characteristics of its users across different periods. Second, if the company wants to predict the expenditure level type of its users within n years, then the shortest observation period can be calculated by using the developed model. Observing the car usage characteristics of users within the shortest observation period can help predict their expenditure level types within n years. Companies can then dynamically adjust their user management policies.

User Layering Results
Let the cluster number range from 2 to 15 and the optimal one is equal to 3. The first, second, and third categories have 2809, 1367, and 1026 users, respectively. The overall model goodness (average contour factor) is 0.53, thereby suggesting a good performance.
On the basis of the optimal results obtained from the previous clustering experiments, the car-sharing users are classified into three groups according to their total expenditure amount. The results represent the revenue contribution level of users to a car-sharing company. Table 7 shows the optimal clustering results. The percentage of people represents the proportion of various users to the total number of users. The first, second, and third groups account for 19.7%, 26.3%, and 54.0% of all users, respectively. Therefore, the percentage of users from various groups and their total amount of expenditures are not proportional. In this case, each user group is defined according to their contribution to the revenue of car-sharing companies. Despite having the smallest number of users, the first group greatly contributes to the revenue of car-sharing companies (68.9%). Therefore, this group is defined as a high-revenue contribution group (HG). For car-sharing companies, approximately 20% of new users' revenue contribution accounts for 70% of the revenues contributed by all new users. This conclusion has important implications for car-sharing companies in their development of user-centric operating strategies. Meanwhile, despite having the largest number of users, the third group contributes the least to the revenues of car-sharing companies (9.5%) and is accordingly labeled as the low-revenue contribution group (LG). The second group lies somewhere in between in terms of its revenue contribution and is labeled as the middle-revenue contribution group (MG) accordingly.

User Prediction Results
A multi-layer perception model that considers periodic features can achieve two purposes in its practical application in car-sharing companies. First, car-sharing companies want to obtain satisfactory predictions with the shortest observation periods. This model obtains the model prediction accuracy value for different observation periods via calculation and comparison. An observation of the prediction accuracy values under different observation periods reveals that the prediction accuracy that satisfies the application requirements can be obtained when the observation period is set as five weeks. Second, the revenue contribution type of new users during the assessment period can be obtained by calculating their behavior characteristics during the observation period. Table 8 presents the prediction results for the first 6 weeks and the 12th week (the duration of the observation period is equal to the duration of the assessment period). The curve of the prediction accuracy calculated from the 12-week observation period is shown in Figure 6.  The prediction accuracy for the 12-week observation period is shown in Figure 6. The prediction accuracy of users gradually increases along with the observation period. Those users with a low revenue contribution obviously have a higher prediction accuracy compared with those with a high revenue contribution. The prediction accuracy of each group is close to 100% at 12 weeks (that is, the observation period is equal to the assessment period) as shown in the figure. Therefore, each characteristic variable has a high fitting precision to the model, thereby proving that the characteristic variable in the predictive model is very scientific and meaningful. Figure 6 shows that if the prediction accuracy threshold is 80%, the overall prediction accuracy is 82.1%, which exceeds the threshold when the observation period is 4 weeks. In this case, the prediction accuracies of the model for low-and high-revenue contribution users reach 92.7% and 72.2%, respectively. Meanwhile, if the prediction accuracy threshold is 85%, then the overall prediction accuracy is 85.3% which exceeds the threshold when the observation period is 5 weeks. At this time, the prediction accuracies of the model for low-and high-revenue -contributing users reach 95.3% and 76.1%, respectively.  Figure 6 shows that if the prediction accuracy threshold is 80%, the overall prediction accuracy is 82.1%, which exceeds the threshold when the observation period is 4 weeks. In this case, the prediction accuracies of the model for low-and high-revenue contribution users reach 92.7% and 72.2%, respectively. Meanwhile, if the prediction accuracy threshold is 85%, then the overall prediction accuracy is 85.3% which exceeds the threshold when the observation period is 5 weeks. At this time, the prediction accuracies of the model for low-and high-revenue -contributing users reach 95.3% and 76.1%, respectively. A car-sharing classification prediction model is then built on the basis of user characteristics. This model proves that the revenue contribution type of users three months after using the car can be determined by observing their characteristics over the next five weeks. This model has a prediction accuracy of 85%. However, given the limited amount of data, only the car usage characteristic variables, sex, and age of users are modeled. For car-sharing companies, using additional dimensional data and long-term data observations can extend the assessment period. Therefore, this model has scalability and high practical values.

Conclusions
This paper aims to help car-sharing companies alleviate their profit anxiety and to further promote the development of the whole car-sharing industry. To this end, empirical data are used to A car-sharing classification prediction model is then built on the basis of user characteristics. This model proves that the revenue contribution type of users three months after using the car can be determined by observing their characteristics over the next five weeks. This model has a prediction accuracy of 85%. However, given the limited amount of data, only the car usage characteristic variables, sex, and age of users are modeled. For car-sharing companies, using additional dimensional data and long-term data observations can extend the assessment period. Therefore, this model has scalability and high practical values.

Conclusions
This paper aims to help car-sharing companies alleviate their profit anxiety and to further promote the development of the whole car-sharing industry. To this end, empirical data are used to capture the car-sharing usage characteristics of car-sharing users. By using operating data of the Lan Zhou car-sharing company, the car-sharing users are divided into several groups, which are used as the basis for examining the structure of these users and the efficient operation of car-sharing companies. By using empirical data, the new users can be divided into three groups via clustering analysis. Each group has unique characteristics in terms of their number of users and expenditure levels. Among the new users, 20% generate revenues that account for 70% of the total revenues contributed by all users. Therefore, in the case of limited resources, a car-sharing company can achieve the maximum benefit by analyzing the preferences of this 20% and by rationally configuring its resources. A differentiated management of users can also help retain and increase the satisfaction of high-revenue contribution users, reduce company costs, and increase company revenues.
The revenue contribution type of the 84 days after using a car can be determined by observing their characteristics in the next 5 weeks. This approach has a prediction accuracy of more than 85%. Moreover, the future long-term expenditure level of users can be determined by relying on short-term observations of users. Car-sharing companies should focus on those users with a high revenue contribution, given that increasing the satisfaction of these users can guarantee a source of revenue for these companies. Meanwhile, for those users with a medium revenue contribution, companies must study their demand characteristics and convert them into users with high revenue contribution. Companies who aim to expand their development scale should target those users with a low revenue contribution. Several insights and recommendations for improving the service quality and developing the scale of car-sharing companies are then formulated from the results of the targeted visit investigation.
In sum, a small percentage of high-revenue users can generate revenues that account for the majority of the total revenues of a car-sharing company, whereas other types of users, despite their large number, can only generate minimal revenues for the company. The proposed model has been proven feasible and accurate for predicting the behavior of car-sharing users. These companies must predict their users in advance and manage them scientifically in consideration of their economic benefits. This research has reference significance for car-sharing companies that aim to build a user management system that can help them achieve sustainable development.
Although this study layers the users of car-sharing services, the differences among users from different groups have not been explored. To address this gap, a comparative study of each layer of users will be carried out in the future. Moreover, given the limited amount of data used in this study, a predictive period of 84 days (three months) is considered. In the future, a longer predictive period will be used by collecting additional data. A car-sharing location model will also be developed on the basis of the conclusions obtained from this paper and in consideration of the economic benefits of car-sharing companies.

Conflicts of Interest:
The authors declare no conflict of interest.