Modeling and Forecasting Passenger Car Ownership Based on Symbolic Regression

Numerous functions, especially the Gompertz function, have been predetermined to analyze the growth in vehicle ownership. This study utilizes the data-driven symbolic regression to automatically find a generalized function, named as new equation by symbolic regression (NE-SR), for passenger car ownership in six representative countries including Japan, England, USA, Finland, Poland and Australia. Then the new proposed function is applied for forecasting the passenger car ownership in China up to the year 2060. The experimental results indicate that the NE-SR, as an extension of the Gompertz function, fits better than the classical Gompertz function for car ownership growth. In NE-SR function, three scenarios can be realized by the variation of parameter signs, which are represented by the patterns of Japan, USA and Australia, respectively. The predicted results based on the NE-SR also show that the Chinese car ownership still has a potential to increase after 2060 in the pattern of Japan and Australia, but grows until around 2057 in the pattern of USA. The results can be used to further predict the energy demand and carbon emissions of passenger cars, which can provide a basis for the policymaker to propose transportation and environmental strategies.


Introduction
The worldwide increase in urban mobility since the 1960s has directly resulted in increasing motor vehicles, especially in many low-income populous countries, such as China and India [1]. The tremendous growth of vehicle ownership has caused a series of problems, e.g., the increase of oil consumption, air pollution emissions, severe traffic congestion and the lack of parking space, etc. Moreover, car ownership is an important variable in car travel behavior research [2]. Therefore, it is important for academic researchers, environmentalists and policymakers to accurately forecast the development trend of vehicle ownership.
Vehicle ownership modeling has been widely researched. The models developed during 1995-2002 were reviewed and classified into nine categories in [3], which can be further divided into aggregate and disaggregate models according to data type. In the aggregate models, the ownership level of various vehicles, e.g., cars [4] and hybrid electric vehicles [5], can be analyzed on the basis of product life cycle and diffusion model which contains several sigmoid-shaped functions, e.g., the logistic, the Richards and Gompertz function [6,7]. Furthermore, the Gompertz function has been found to best fit the historical vehicle ownership data among these three functions [8], and a variety of researches studied the environment and transportation policy by assuming the growth of vehicle ownership as Gompertz function. For example, the future vehicle energy demand and greenhouse gas (GHG) were estimated depending on Gompertz function [9,10]. The effects of the two license quota policies on car ownership levels were compared and the delays in the process of personal motorization in Shanghai and Beijing were examined in [11]. However, is there any other function fitting better than Gompertz function to describe the relationship between economic factor and car ownership? This is an interesting problem worth in-depth investigation. In fact, several improved Gompertz functions have already been proposed for better forecasting vehicle ownership growth [12][13][14], where the corresponding parameters were estimated by statistics-based regression methods.
The traditional statistics-based regression methods need to assume a predetermined form of function according to the experience and knowledge. Then, the parameters of the proposed function are estimated by non-linear least squares method, maximum likelihood method, etc. These methods generally have solid and widely accepted mathematical foundations and can provide more insight in relationship among variables. However, the experience or knowledge is sometimes limited in certain research field. It is difficult for the traditional regression to determine the most approximate function model for a given data set. Moreover, the mathematical functions are built on strong assumptions which are sometimes not practically relevant to the real world.
Different with the traditional regression methods, symbolic regression (SR) can automatically establish suitable model of the numeric data set without the assumption of function forms. It is a data-driven method based on the extended genetic programming (GP), proposed by Cramer in 1985 [15] and developed by Koza [16]. It has been successfully utilized to define the hidden relationships in many fields. For instance, SR was demonstrated on four simulated and two real systems spanning mechanics, ecology and system biology [17]. Motion-tracking data was searched from various physical systems, and Hamiltonians, Lagrangians, and other laws of geometric and momentum conservation were discovered [18]. Hubbert theory in oil production was modeled as Guassian distribution [19]. An accurate traffic speed prediction was built to generate significant information for travellers [17]. To our best knowledge, the work in [17] is the first to use this method in the field of transportation.
The Chinese automotive market has greatly grown over the past two and a half decades and the number of vehicles is expected to dramatically increase further. The medium-and long-term development plan of automobile industry issued by Ministry of Industry and Information Technology of the People's Republic of China in 2017 forecasted that auto production will reach 30 million in 2020 and 35 million in 2050. Therefore, it is necessary to analyze and forecast the passenger car ownership in China. Different from the previous researches which assumes the relationship between vehicle ownership and economic factors as an S-shape function, this study automatically establishes the relation between the passenger car ownership and the gross domestic product (GDP) per capital by the data-driven method, SR. The newfound relation includes the Gompertz function as a special case and fits better than the traditional Gompertz function in the six selected countries whose automotive industry has entered the saturated period.
The remainder of this paper is organized as follows. The SR method and the traditional Gompertz function are briefly introduced in Section 2. The data sources are then presented in Section 3. Section 4 examines our approach on the synthetic data, proposes a novel vehicle ownership function for six representative countries and then applies the proposed function to predict and analyze the car ownership in China. Conclusions are finally drawn in Section 5.

Symbolic Regression
The procedures of SR via GP mainly include four steps and the pseudo code is described in Algorithm 1.
The typical representation of individual in SR is a parse tree, which generally has two types of nodes. They are leaf nodes and internal nodes. The leaf nodes consist of the terminal symbols, such as Sustainability 2018, 10, 2275 3 of 16 decision variables, constants or other problem parameters, and the internal nodes represent arithmetic functions, e.g., +, −, ×, ÷, e, ln, etc. Figure 1 shows an example of the tree structure in SR individual for the equation 1 + (x × y). Once the Function Set (FS) for internal nodes, the Symbol Set (SS) for leaf nodes, the Maximum Depth of Tree (DT) and Population M are determined, initial population of M trees are then randomly generated with FS, SS and DT.  Generate initial population with FS, SS and M, and set gen = 0.
Reproduction: Copy individual i into the next generation; 8.
Crossover: Randomly recombine individual i and i + 1 to create new one into next generation; 10. If Randomly Probability < Pm, 11. Mutation: Mutate individual i randomly to create new individual into the next generation; 12. End while 13. Memorize the best solution achieved so far; 14. gen = gen + 1; 15. End while Return best solution.
Each tree-structure individual in SR represents one corresponding function. The function value f(Xi) of each individual Xi can be obtained when one data sample (Xi, fi) is given. Thus, the fitness of each individual is measured by mean absolute deviation (MAD), defined as in Equation (1).
where is the actual value of data sample i, ( ) is the predicted function value by SR. (3) Step 3: Individual revolution by genetic operators.
There are three common genetic operators in GP, which are reproduction, crossover and mutation. In the reproduction operator, the candidate solution is directly duplicated into the next generation. The crossover operator recombines two parents to generate a new child individual and the mutation operator randomly mutates a node of the chosen tree. Figure 2 illustrates the three genetic operators in detail [14].
Once the fitness of the population is calculated, the genetic operators are conducted to evolve model structures and parameters. Furthermore, the Reproduction Probability (Pr), Crossover Generate initial population with FS, SS and M, and set gen = 0.
Reproduction: Copy individual i into the next generation; 8.
Crossover: Randomly recombine individual i and i + 1 to create new one into next generation; 10. If Randomly Probability < Pm, 11. Mutation: Mutate individual i randomly to create new individual into the next generation; 12. End while 13. Memorize the best solution achieved so far; 14. gen = gen + 1;

End while
Return best solution.
Each tree-structure individual in SR represents one corresponding function. The function value f (X i ) of each individual X i can be obtained when one data sample (X i , f i ) is given. Thus, the fitness of each individual is measured by mean absolute deviation (MAD), defined as in Equation (1).
where f i is the actual value of data sample i, f (X i ) is the predicted function value by SR. There are three common genetic operators in GP, which are reproduction, crossover and mutation. In the reproduction operator, the candidate solution is directly duplicated into the next generation. The crossover operator recombines two parents to generate a new child individual and the mutation operator randomly mutates a node of the chosen tree. Figure 2 illustrates the three genetic operators in detail [14].
Once the fitness of the population is calculated, the genetic operators are conducted to evolve model structures and parameters. Furthermore, the Reproduction Probability (Pr), Crossover Probability (Pc), Mutation Probability(Pm) and Maximum Generation (G) need to be determined to control the revolution for this step. (4) Step 4: Best model selection among outputs.
SR generally returns a large number of models and the models with higher precisions tend to be selected. However, the more precise models are generally more complex, which are also more possible to be over-fitted. To balance the accuracy and complexity and efficiently control of over-fitting problem, a Pareto front is built to further select the best solution generated by SR, which is widely used in [19][20][21].   (4) Step 4: Best model selection among outputs.
SR generally returns a large number of models and the models with higher precisions tend to be selected. However, the more precise models are generally more complex, which are also more possible to be over-fitted. To balance the accuracy and complexity and efficiently control of over-fitting problem, a Pareto front is built to further select the best solution generated by SR, which is widely used in [19][20][21].

Gompertz Function
The increase of car ownership can be divided into three periods: slow-growth period at the low-level income, boom period as income rapidly rises and saturated period. Previous studies assumed the relationship between car ownership and economic factors (e.g., per-capita GDP or per-capita income) to follow some sort of S-shaped function, e.g., the logistic, the Richards functions and Gompertz function. The Gompertz function has been found to best fit the historical car ownership data [6,8,10]. Moreover, it has been proved in [6] that the Gompertz function can effectively describe the long-run relationship between the vehicle ownership and GDP per-capita and noted that using only GDP per capita, ignored other factors, can already substantially explain this relationship.
The basic function of Gompertz function is expressed as Equation (2) Gompertz Function: where α is the ultimate saturation level of car ownership, β and γ are two parameters that determine the shape and curvature of S-curve. x is an economic indicator, denoting per-capita GDP and y denotes the long-run equilibrium level of the car/100 population ratio. Then, the Equation (2) is converted to Equation (3) by taking the logarithmic operation on both sides If we let α = ln α and f (x) = lny for short, the equation above is then simplified as Equation (4).
If we log-linearize the Equation (3), it can be transformed into Equation (5).
Then, ln(−β) and γ can be regressed by OLS for time series data. Especially, the ultimate saturation level of car ownership in a certain country α should be known or assumed when the sample data are not included the inflection point.

Selection of Case Countries
This study aims to use SR to search a model function of passenger car ownership without the assumption of the model form, and the future vehicle ownership in China will be estimated using the model obtained by SR. Six representative countries are selected as the sample data of SR, including United States, Japan, the United Kingdom, Finland, Poland and Australia. These countries are chosen because firstly, the increase of passenger car ownership in these countries has displayed a comparably smoother growth and even a saturated pattern. It means the increasing curves of the passenger car ownership in these countries contain an inflection point. Secondly, the data sources of the vehicle ownership in these countries are available for a long enough period, including all the three of slow-growth period, boom period and saturated period. Thirdly, these six countries are from North America, Asia, Europe and Oceania, which can represent different patterns of the passenger car ownership to some extent. Moreover, vehicle ownership in China remains in the rapid growth period and has not reached the inflection point of the increasing curve. It is necessary for the government policy makers and business managers to forecast the Chinese passenger car ownership. Therefore, the model found by SR is applied in the case of China.

Statistics Data
Population data for all countries are collected from World Population Prospects 2017 issued by United Nations. The data on GDP per capita for all countries are sourced from the World Bank, which have already converted GDP at each country to current U.S. dollars. The GDP per capita in China predicted for the period of the 2018-2060 is based on Organization for Economic Co-operation and Development (OECD) (2014) [21]. Various sources are used to obtain the data of historical passenger car ownership in the seven countries.

Experimental Results
In this section, we attempt to answer the following aspects of questions: - The validation aspect: can SR really find the Gompertz function underneath data? - The discovery aspect: is there any other mathematical function fitting better than Gompertz function for describing the relationship between economic factor and car ownership? - The prediction aspect: what is the trend of vehicle ownership in China based on the newfound function?

Validation on Synthetic Data
In order to verify the validation aspect, a Gompertz function is defined in advance and a set of data is generated according to this Gompertz model. Then, the data set is taken as the samples for SR. By learning from the synthetic data, it can be tested whether the SR can efficiently find the Gompertz function.
As mentioned in Section 2, the Gompertz function can be rewritten as Equation (4). Obviously, the Gompertz function in Equations (2) and (4) equals to each other. When α, β and γ are estimated, the Gompertz function can be identified.
Twenty data are randomly generated by Equation (7), which is the same as the parameter set in [22]. Taking them as input samples, the SR is conducted and it returns a huge number of models. The Pareto front of the returned models is shown as blue points in Figure 3. We can find in Figure 3 that SR can correctly discover Equation (7).

Model Discovery for the Six Representative Countries
In the discovery aspect, the SR is utilized to learn models with the data of the six representative countries. As a consequence, a series of functions are generated for each country. A generalized function is selected from the functions of the six countries by considering complexity and precision. The new equation by SR, named as new equation by symbolic regression (NE-SR), is finally compared against the Gompertz model obtained from SPSS. The accuracy differences between the proposed function and the Gompertz model can verify whether the new model is preferable than Gompertz model in predicting the relation of car ownership with the GDP per capita. Furthermore, the characteristics of NE-SR are analyzed as well.

New Function Discovery
The car ownership models of the six selected countries are established by SR depending on the data of ownership, population and GDP per capita introduced in Section 3.1. There are around 10 functions on Pareto front returned for each country and the results are shown in Table 1. Here, the fitness is calculated by the MAD in Equation (1) and the function complexity is the sum of the operator complexity, which is shown in Table 2.
It can be found from Table 1 that: (1) Gompertz function can be obtained by SR for each country except Finland, which means the Gompertz function is not suitable for all the countries; (2) A generalized model (NE-SR) written as ′ = ′ • exp ( ′ • − β′ • exp(− ′ • )) can be found for each country from the Pareto front. They are the 5th, 5th, 6th, 4th, 4th and 5th model for Japan, England, USA, Finland and Poland, respectively. It demonstrates that, depending on SR, it is a reliable model to describe the car ownership. However, it should be noted that the NE-SR is not found for Australia, which will be explained in the next section.

Model Discovery for the Six Representative Countries
In the discovery aspect, the SR is utilized to learn models with the data of the six representative countries. As a consequence, a series of functions are generated for each country. A generalized function is selected from the functions of the six countries by considering complexity and precision. The new equation by SR, named as new equation by symbolic regression (NE-SR), is finally compared against the Gompertz model obtained from SPSS. The accuracy differences between the proposed function and the Gompertz model can verify whether the new model is preferable than Gompertz model in predicting the relation of car ownership with the GDP per capita. Furthermore, the characteristics of NE-SR are analyzed as well.

New Function Discovery
The car ownership models of the six selected countries are established by SR depending on the data of ownership, population and GDP per capita introduced in Section 3.1. There are around 10 functions on Pareto front returned for each country and the results are shown in Table 1. Here, the fitness is calculated by the MAD in Equation (1) and the function complexity is the sum of the operator complexity, which is shown in Table 2.
It can be found from Table 1 that: (1) Gompertz function can be obtained by SR for each country except Finland, which means the Gompertz function is not suitable for all the countries; (2) A generalized model (NE-SR) written as y = α · exp (θ ·x − β · exp(−γ ·x)) can be found for each country from the Pareto front. They are the 5th, 5th, 6th, 4th, 4th and 5th model for Japan, England, USA, Finland and Poland, respectively. It demonstrates that, depending on SR, it is a reliable model to describe the car ownership. However, it should be noted that the NE-SR is not found for Australia, which will be explained in the next section.

Validation of New Function
NE-SR model is proved to be a good choice to represent the growth of passenger car ownership from two aspects: first, by comparing the errors of NE-SR with the Gompertz function generated by SPSS; and second, by analyzing the characteristics of NE-SR and comparing the trends of passenger car ownership in each country.
The Gompertz function has been widely utilized to estimate the car ownership. Therefore, the new generalized function (NE-SR) is compared with the Gompertz function obtained from SPSS. The comparison results are shown in Table 3. It can be observed in Table 3 that the three errors of NE-SR such as MAD, mean absolute percentage error (MAPE) and root mean squared error (RMSE) for the five countries are all lower than those of Gompertz function. Here, the MAD, MAPE and RMSE are expressed as Equations (1), (8) and (9). Therefore, the NE-SR is of higher precision than the Gompertz function by SPSS. Because the NE-SR is not found for Australia, the three errors of Gompertz function found by SR are compared against the Gompertz function by SPSS, which is also shown in Table 2. Therefore, it is proved that the precision of SR is better than the Gompertz function by SPSS for all the six representative countries. The NE-SR for long-run car ownership y' as a function of per-capita GDP can be written as: where α , β and γ are positive values.
Since lim x→+∞ (y − α ·exp(θ ·x)) = 0, then y asy = α ·exp(θ ·x) is the asymptotic line of the NE-SR. Thus, the parameters α and θ determine the future trend of the vehicle ownership. Since the value of parameter α is positive, we only discuss the effect of the sign of the parameter θ on the passenger ownership in three scenarios. Figure 4 shows three examples of the NE-SR with different signs of θ .
Sustainability 2018, 10, x FOR PEER REVIEW 11 of 16 it is proved that the precision of SR is better than the Gompertz function by SPSS for all the six representative countries. The NE-SR for long-run car ownership y' as a function of per-capita GDP can be written as: where α′, β′ and γ′ are positive values.
Since lim → ( − ′•exp( ′•x)) = 0, then = ′•exp( ′•x) is the asymptotic line of the NE-SR. Thus, the parameters ′ and ′ determine the future trend of the vehicle ownership. Since the value of parameter ′ is positive, we only discuss the effect of the sign of the parameter ′ on the passenger ownership in three scenarios. Figure 4 shows three examples of the NE-SR with different signs of ′. (1) Scenario 1. When ′ = 0, the NE-SR is reduced to the Gompertz function. It means that the Gompertz function is a special form of the NE-SR, which is the reason that Table 1 obtained from SR only contains the Gompertz function for the passenger car ownership in Australia. In this scenario, the parameter ′ denotes the saturation level for the long-run passenger car ownership. The transportation system achieves the relatively stable state since the car ownership rate is gradually close to = ′.
(2) Scenario 2. When ′ > 0, instead of reaching the saturation level, the car ownership ratio will continue to slowly grows with the increase of per-capita GDP and infinitely approach to the function = ′•ex ( ′ • ) in the third period of car ownership. It is reasonable that people will continue to buy cars as the per-capita GDP grows, which further raises the car ownership ratio. It is supported by [14], which also stated that vehicle ownership slowly grows after the growth rate has reached its saturation level. Notably, the growth of car ownership ratio is limited because per-capita GDP cannot grow forever. The ownership in Japan, England, Finland and Poland are the examples for this scenario. (3) Scenario 3. When ′ < 0, car ownership ratio will decrease with the growth in per-capita GDP in the third period of the vehicle ownership, which seems unusual to a certain extent. However, it has happened in some countries, e.g., USA. In USA, people choose the other travel modes instead of vehicles with the development of public transit, car sharing and bike highway and the increase in car parking fees. Figure 5 illustrates the comparison of car ownership ratio (1) Scenario 1. When θ = 0, the NE-SR is reduced to the Gompertz function. It means that the Gompertz function is a special form of the NE-SR, which is the reason that Table 1 obtained from SR only contains the Gompertz function for the passenger car ownership in Australia. In this scenario, the parameter α denotes the saturation level for the long-run passenger car ownership.
The transportation system achieves the relatively stable state since the car ownership rate is gradually close to y asy = α . (2) Scenario 2. When θ > 0, instead of reaching the saturation level, the car ownership ratio will continue to slowly grows with the increase of per-capita GDP and infinitely approach to the function y = α ·exp(θ ·x) in the third period of car ownership. It is reasonable that people will continue to buy cars as the per-capita GDP grows, which further raises the car ownership ratio. It is supported by [14], which also stated that vehicle ownership slowly grows after the growth rate has reached its saturation level. Notably, the growth of car ownership ratio is limited because per-capita GDP cannot grow forever. The ownership in Japan, England, Finland and Poland are the examples for this scenario. (3) Scenario 3. When θ < 0, car ownership ratio will decrease with the growth in per-capita GDP in the third period of the vehicle ownership, which seems unusual to a certain extent. However, it has happened in some countries, e.g., USA. In USA, people choose the other travel modes instead of vehicles with the development of public transit, car sharing and bike highway and the increase in car parking fees. Figure 5 illustrates the comparison of car ownership ratio obtained from NE-SR and Gompertz function for USA.

Forecasting Chinese Vehicle Ownership
To conduct the prediction, the growth of Chinese passenger car ownership is assumed to follow the pattern of the six representative countries, respectively. It means China's car ownership takes the same asymptotic line as the six countries. This assumption is reasonable because China's industry development is generally imitating the mature experience of other developed countries. Actually, previous studies have also used the similar assumption. For instant, the patterns in OECD, Europe, UAS and Japan are discussed for Chinese vehicle ownership in [15]; the Europe and Japan pattern was separately taken as the high and medium patterns of the stock of private LDVs in China in [8]; and the ownership of highway, motor cycles and rural vehicles were assumed to follow different patterns of motor vehicle growth in Europe and Asia [23]. Six regression models are then obtained for China by SPSS and the most preferable one for each scenario mentioned in the previous section can be selected. Finally, the future passenger car ownership in China is forecasted based on each scenario.
Over the past two and a half decades, China has experienced great growth in the automotive market. The number of vehicles in the Chinese passenger car fleet is expected to dramatically increase and will match the current US car population by around 2020 [23]. Therefore, to further explain the new proposed function, it is applied into analyzing the increase of vehicle ownership in China. However, because passenger car ownership in China is still in the boom period, there are not enough data to fit the development of passenger car ownership due to the lack of inflection point data. Therefore, asymptotic line of the NE-SR is determined in advance according to the pattern of the six selected countries, which is similar with the treatment in the researches of Chinese car ownership using the Gompertz function [6,9,10].
The regressive results of the NE-SR for Chinese vehicle ownership are shown in Table 4. It can be seen that the pattern of Japan under the NE-SR fits Chinese passenger car ownership data best among all the patterns of positive parameter θ′ in the NE-SR, followed by these of England, Poland and Finland. Therefore, the pattern of England, Poland and Finland is not discussed here and the pattern of development trends in Japan, USA and Australia are utilized to forecast passenger car

Forecasting Chinese Vehicle Ownership
To conduct the prediction, the growth of Chinese passenger car ownership is assumed to follow the pattern of the six representative countries, respectively. It means China's car ownership takes the same asymptotic line as the six countries. This assumption is reasonable because China's industry development is generally imitating the mature experience of other developed countries. Actually, previous studies have also used the similar assumption. For instant, the patterns in OECD, Europe, UAS and Japan are discussed for Chinese vehicle ownership in [15]; the Europe and Japan pattern was separately taken as the high and medium patterns of the stock of private LDVs in China in [8]; and the ownership of highway, motor cycles and rural vehicles were assumed to follow different patterns of motor vehicle growth in Europe and Asia [23]. Six regression models are then obtained for China by SPSS and the most preferable one for each scenario mentioned in the previous section can be selected. Finally, the future passenger car ownership in China is forecasted based on each scenario.
Over the past two and a half decades, China has experienced great growth in the automotive market. The number of vehicles in the Chinese passenger car fleet is expected to dramatically increase and will match the current US car population by around 2020 [23]. Therefore, to further explain the new proposed function, it is applied into analyzing the increase of vehicle ownership in China. However, because passenger car ownership in China is still in the boom period, there are not enough data to fit the development of passenger car ownership due to the lack of inflection point data. Therefore, asymptotic line of the NE-SR is determined in advance according to the pattern of the six selected countries, which is similar with the treatment in the researches of Chinese car ownership using the Gompertz function [6,9,10].
The regressive results of the NE-SR for Chinese vehicle ownership are shown in Table 4. It can be seen that the pattern of Japan under the NE-SR fits Chinese passenger car ownership data best among all the patterns of positive parameter θ in the NE-SR, followed by these of England, Poland and Finland. Therefore, the pattern of England, Poland and Finland is not discussed here and the pattern of development trends in Japan, USA and Australia are utilized to forecast passenger car ownership because parameter θ of the NE-SR functions is respectively positive, negative and zero, which represent the three scenarios of car ownership.  Figure 6 shows the ownership ratios of passenger car in China calculated in the three patterns of Japan, USA and Australia. It can be found in Figure 6 that in the pattern of Japan, the ratio of passenger car ownership increases along with GDP per capita till GDP per capita reaches 50,000 dollars, which means Chinese passenger car ownership has not reached the saturated level. The passenger car ownership in the pattern of USA reaches the saturated level of 45.245 per 100 people around the GDP per capita of 38.402 dollars and then it decreases gradually. In the pattern of Australia, i.e., Gompertz function with α = 55.868, Chinese passenger car ownership grows fastest but will not enter the saturated period when GDP per capita reaches 50,000 dollars. ownership because parameter θ′ of the NE-SR functions is respectively positive, negative and zero, which represent the three scenarios of car ownership.  Figure 6 shows the ownership ratios of passenger car in China calculated in the three patterns of Japan, USA and Australia. It can be found in Figure 6 that in the pattern of Japan, the ratio of passenger car ownership increases along with GDP per capita till GDP per capita reaches 50,000 dollars, which means Chinese passenger car ownership has not reached the saturated level. The passenger car ownership in the pattern of USA reaches the saturated level of 45.245 per 100 people around the GDP per capita of 38.402 dollars and then it decreases gradually. In the pattern of Australia, i.e., Gompertz function with α = 55.868, Chinese passenger car ownership grows fastest but will not enter the saturated period when GDP per capita reaches 50,000 dollars.  Figure 7 illustrates the corresponding projected Chinese passenger car ownership during the period of 2018-2060 based on the GDP per capita predicted in OECE (2014). Moreover, Chinese passenger car ownership will reach the saturation level of 400 and 500 vehicles per 100 people, also shown in Figure 7, which is assumed in [8]. The results in Figure 7 show that passenger car ownership continues to increase in the pattern of Gompertz function (α = 40 and α = 50) and Australia (α = 55.868) till 2060. The level of passenger car ownership will be 39.561, 49.034 and 49.276 per 100 people, respectively. It means that growth of passenger car ownership in China has not reached the saturated level in 2060 for these three scenarios. For the pattern of Japan, the  Figure 7 illustrates the corresponding projected Chinese passenger car ownership during the period of 2018-2060 based on the GDP per capita predicted in OECE (2014). Moreover, Chinese passenger car ownership will reach the saturation level of 400 and 500 vehicles per 100 people, also shown in Figure 7, which is assumed in [8]. The results in Figure 7 show that passenger car ownership continues to increase in the pattern of Gompertz function (α = 40 and α = 50) and Australia (α = 55.868) till 2060. The level of passenger car ownership will be 39.561, 49.034 and 49.276 per 100 people, respectively. It means that growth of passenger car ownership in China has not reached the saturated level in 2060 for these three scenarios. For the pattern of Japan, the growth of the passenger car ownership is slow, and the ratio is only 33.565 in 2060. For the pattern of USA, the passenger car ownership enters the saturated period in 2060. The passenger car ownership will increase to 45.245 around the year 2057 and then decrease slowly to 45.173 per 100 people in 2060.
growth of the passenger car ownership is slow, and the ratio is only 33.565 in 2060. For the pattern of USA, the passenger car ownership enters the saturated period in 2060. The passenger car ownership will increase to 45.245 around the year 2057 and then decrease slowly to 45.173 per 100 people in 2060. Among these five patterns, the growth in passenger car ownership is the slowest in the pattern of Japan and is the fastest in the Australia pattern (Gompertz function with α = 55.868). Furthermore, passenger car ownership per 100 people in China will not reach the saturation point by 2060 in the pattern of Japan, Australia and Gompertz functions with α = 40 and α = 50. In the pattern of USA, the rate of passenger car ownership saturates around the year of 2057. It means that the Chinese passenger car stock still has a potential to increase after 2060 in the pattern of Japan, Australia and Gompertz functions with α = 40 and α = 50, and at least grows until 2057 under all patterns.

Conclusions
This paper utilizes a data-driven symbolic regression method to describe the future development of car ownership, which has its own advantage in automatically establishing suitable models of the numeric data set without assuming function forms. A generalized function has been found to better fit the trend of passenger car ownership in six representative countries (Japan, England, USA, Finland, Poland and Australia) and the traditional Gompertz function is a special pattern of this new function. Moreover, three scenarios of the car ownership are obtained depending on the alternative signs of the parameters in the new function. These scenarios represent the patterns of Japan, USA and Australia (Gompertz function), respectively. Finally, the patterns of these three countries are applied into analyzing passenger car ownership in China. The predicted results are compared against two Gompertz functions of car ownership in the previous research and show that the Chinese passenger car will reach 39.561, 49.034 and 49.276 per 100 people in 2060, respectively in the pattern of Gompertz function (α = 40 and α = 50) and Australia (α = 55.868) but will increase to 45.245 around the year 2057 and then decrease slowly to 45.173 per 100 people in 2060 in the pattern of UAS. It means that Chinese passenger car ownership still has a potential to increase after 2060 in the pattern of Japan, Australia and Gompertz functions with α = 40 and α = 50, but grows until around 2057 under all patterns at least. In the future, the new function will be used Among these five patterns, the growth in passenger car ownership is the slowest in the pattern of Japan and is the fastest in the Australia pattern (Gompertz function with α = 55.868). Furthermore, passenger car ownership per 100 people in China will not reach the saturation point by 2060 in the pattern of Japan, Australia and Gompertz functions with α = 40 and α = 50. In the pattern of USA, the rate of passenger car ownership saturates around the year of 2057. It means that the Chinese passenger car stock still has a potential to increase after 2060 in the pattern of Japan, Australia and Gompertz functions with α = 40 and α = 50, and at least grows until 2057 under all patterns.

Conclusions
This paper utilizes a data-driven symbolic regression method to describe the future development of car ownership, which has its own advantage in automatically establishing suitable models of the numeric data set without assuming function forms. A generalized function has been found to better fit the trend of passenger car ownership in six representative countries (Japan, England, USA, Finland, Poland and Australia) and the traditional Gompertz function is a special pattern of this new function. Moreover, three scenarios of the car ownership are obtained depending on the alternative signs of the parameters in the new function. These scenarios represent the patterns of Japan, USA and Australia (Gompertz function), respectively. Finally, the patterns of these three countries are applied into analyzing passenger car ownership in China. The predicted results are compared against two Gompertz functions of car ownership in the previous research and show that the Chinese passenger car will reach 39.561, 49.034 and 49.276 per 100 people in 2060, respectively in the pattern of Gompertz function (α = 40 and α = 50) and Australia (α = 55.868) but will increase to 45.245 around the year 2057 and then decrease slowly to 45.173 per 100 people in 2060 in the pattern of UAS. It means that Chinese passenger car ownership still has a potential to increase after 2060 in the pattern of Japan, Australia and Gompertz functions with α = 40 and α = 50, but grows until around 2057 under all patterns at least. In the future, the new function will be used to forecast the ownership, energy demand and carbon emissions of various vehicles in the transportation industry, and the predicted results can be used as a basis for the policymaker to propose transportation strategies.