A Bus Service Evaluation Method from Passenger’s Perspective Based on Satisfaction Surveys: A Case Study of Beijing, China

: As an important part of urban public transport, bus service quality is an important factor affecting the choice of passenger travel mode. This paper constructs a set of satisfaction evaluation indicator systems from the perspective of passenger perception, covering the whole travel process. It is composed of 6 ﬁrst-level indexes (timeliness, safety, convenience, comfort, reliability and economy) and 21 second-level indexes. Considering the scale of bus service in Beijing, this research carried out a stratiﬁed sampling on 100 bus lines and collected 3012 ﬁeld questionnaire surveys. The basic information of the bus routes investigated, demographic questions and their opinions of the satisfaction of the bus service were all recorded in the questionnaire. After testing the reliability and validity of the indicator system, the paper proposes a satisfaction evaluation model weighted by the related coefﬁcient. The results show that overall satisfaction score is 78.2 and the proportion of bus passengers who are satisﬁed with the bus service nearly 70%. Multivariate analysis of variance methods were employed to evaluate the satisfaction inﬂuencing factors. Conclusions can be drawn that the satisfaction score of timeliness is lowest, which is mainly inﬂuenced by three factors: the passenger’s age, travel purpose and time. The research provides positive contributions toward normalizing performance evaluation for public transportation and enhancing the sustainable development of bus.


Introduction
Customer satisfaction with the public transport system has practical significance for the related decision-making departments. As a public service in Beijing, the bus service needs to provide better mobility for non-drivers and meet their satisfaction through outstanding performance. Therefore, it is necessary to grasp the key factors in the trip process of public transit through bus passenger satisfaction evaluation and improve the public transport service level, enhancing the attraction of public transportation and promoting the sustainable development of urban traffic.
Accurate passenger satisfaction surveys could assist in decision-making and public transport operational planning. Many studies have focused on the satisfaction evaluation and performance assessment of public transportation. Some related studies have been conducted on the topics of the index system construction [1,2], the method of evaluation [2,3] and the analysis of the influencing factors [4,5]. Table 1. Satisfaction Evaluation Index System (Partial).

First-Level
Second-Level First-Level Second-Level

Survey Design
This paper uses a stratified sampling method to investigate the bus passengers' satisfaction based on the satisfaction indicator system construction. The survey design has three steps: survey sample size determination, sample selection, and questionnaire design.

Sample Size Determination
The survey adopted the stratified random sampling survey method. The survey extracted samples from the overall pool of potential survey respondents to ensure that the survey sample is representative of the overall situation.
The accuracy and efficiency of the survey should be considered while determining the scale of the sample. In statistics, it is generally believed that result with confidence (α = 0.05) greater than 95% is reliable. Z α/2 is Z statistic corresponding to the confidence level. The 95% confidence level was used in this study, and the value of Z statistic was 1.96. The sampling error (E) was assumed to be Sustainability 2018, 10, 2723 4 of 15 2% to ensure accuracy. In addition, the degree of dispersion (p) was presumed to be 0.5. Accordingly, the overall minimum effective sample size is At the same time, the size of each stratified sampling should be greater than or equal to 30 (a requirement of the minimum sample size for a normal distribution). This research adjusted the overall sample size to 3000 and selected 100 bus lines to survey the passenger satisfaction of Beijing in accordance with the minimum sample size requirements.

Types of Bus and Line
The sample survey should consider different types of buses and lines to ensure the objects of satisfaction surveys are more comprehensive. According to the related standards and references, the lines included three types in Beijing: Normal bus, BRT (Bus Rapid Transit) and Customized buses (regular bus service that follows demand response). In addition, this research divided the Normal bus of Beijing into four types: Express bus, Common bus, Branch bus and Microcirculatory bus.

•
Express bus: Travel speed is higher than 20 km/h, and the bus takes a large volume of passengers along transit corridors. • Common bus: Carries the majority of the bus passenger traffic volume in Beijing and can satisfy various requirements of functions, such as it travels on the arterial road at speeds less than the express bus. • Branch bus: Fulfills vital functions close to the end of a trip for passengers to solve the last few kilometers problem. The length of the line is often less than 10 km. • Microcirculatory bus: Mainly distributed on branch roads and residential areas. In addition, the route is more flexible, and the length of the line is less than 6 km.
According to the scale of passenger flows in different types of bus lines, the sample proportion of Express bus, Common bus, Branch bus, Microcirculatory bus, BRT and the Customized bus is 51:37:4:4:2:2. At the same time, the sample distribution in different types of stops (hub stops, common roadside stops, harbor-shaped stops, large stops or small stops) and buses (Double-decker buses, Articulated buses, and Non-articulated bus) were considered in this survey. It is necessary to ensure the survey area covers six central urban area of Beijing.

Survey Object and Time
The bus service quality on weekends and workdays (peak time and off-peak time) are different, and the passenger travel perception is not the same for different genders and ages. Therefore, according to the passenger compositions of the different periods and the general demographic characteristics of Beijing [25], the sample size distribution for each layer is shown in Figure 1. The peak time refers to 7:00-9:00 and 17:00-19:00, and the off-peak time refers to 9:00-17:00. The age division is mainly considered due to varying travel characteristics. Youth-aged is 15-24 (student), middle-aged is 25-59 (commuter), and elderly is ≥60 years old (entertainment group). At the same time, the sample selection was also decided by the respondents' travel purpose, including travel for home, work, official business, school, personal affairs, entertainment or shopping.

Questionnaire Design
The research data were collected using tablets to help respondents to fill in the questionnaires. The questionnaire was structured into two main sections. The first section gathered passengers' opinion about the bus service quality. In the survey, people's satisfaction level about the overall bus service and 21 indicators at second-level were asked. They were evaluated on a ten-point scale, where 1 is the Sustainability 2018, 10, 2723 5 of 15 worst feelings towards service quality and 10 is the best. If the score of an indicator at second-level was 6 or lower, investigators would continue to ask for the satisfaction of the corresponding third-level indicators. The second section of the questionnaire was about investigator information (e.g., name, survey date, survey time, and survey area), bus information (e.g., bus line number, the name of boarding station, type of boarding stop, type of bus, and type of line) and respondent information (e.g., sex, income, age, availability of a private vehicle, and travel purpose). the survey area covers six central urban area of Beijing.

Survey Object and Time
The bus service quality on weekends and workdays (peak time and off-peak time) are different, and the passenger travel perception is not the same for different genders and ages. Therefore, according to the passenger compositions of the different periods and the general demographic characteristics of Beijing [25], the sample size distribution for each layer is shown in Figure 1. The peak time refers to 7:00-9:00 and 17:00-19:00, and the off-peak time refers to 9:00-17:00. The age division is mainly considered due to varying travel characteristics. Youth-aged is 15-24 (student), middle-aged is 25-59 (commuter), and elderly is ≥60 years old (entertainment group). At the same time, the sample selection was also decided by the respondents' travel purpose, including travel for home, work, official business, school, personal affairs, entertainment or shopping.

Survey Data Analysis
This research carried out a one-week investigation at the end of March 2016 by recruiting professional investigators. A total of 3012 questionnaires were administered. After the survey data integration and pretreatment, which included replacing the abnormal data and unifying the format of the fields, this research applied statistical methods to evaluate the reliability and validity to determine whether the evaluation matrix and questionnaire are reliable to evaluate the bus service system [26]. The sample characteristics of the survey were also analyzed.

Reliability Analysis
Reliability refers to whether the data from a survey truly reflect the actual situation of the research object. This research used Cronbach's Alpha reliability coefficient to estimate the reliability of the satisfaction questionnaire. The reliability coefficient of Cronbach's Alpha is used to evaluate the internal consistency of the questionnaire.
The bus passenger satisfaction evaluation matrix is a multi-level index system of comprehensive evaluations. Therefore, the reliability analysis should be performed individually for each subaspect. Research on the reliability analysis was carried out on the data collected from the survey. The Table 2 shows that the Cronbach's Alpha of each index from the questionnaire was higher than 0.7. Therefore, the questionnaire shows high dependability on collecting data and the data can reflect the satisfaction of the bus passengers in Beijing accurately.

Validity Analysis
The validity analysis is to test whether each item is able to investigate the cognitive status of the subject effectively. In this research, the indicator system is determined by industry experts, transport managers and bus fans in advance. Therefore, the confirmatory factor analysis was used to test the validity by AMOS version 22.0 (International Business Machines Corporation, Armonk City, NY, USA). Figure 2, confirmatory factor analysis model matches the survey data well, and goodness of fit statistics is as follows in Table 3.   As we can see from Table 3, CMIN/DF (Chi square/degree of freedom) value is 2.951, which is less than 3.00. RMR (Root mean residual) value is less than 0.05 and RMSEA (Root mean square error of approximation) is less than 0.08. GFI (Goodness-of-fit index), AGFI (Adjusted goodness-of-fit index), NFI (Normed fit index), TLI (Tucker-Lewis index), and CFI (Comparative fit index) values are more than 0.9. All the fitting indexes have reached the fitting standard. Therefore, the construction of the index system is reasonable.  As we can see from Table 3, CMIN/DF (Chi square/degree of freedom) value is 2.951, which is less than 3.00. RMR (Root mean residual) value is less than 0.05 and RMSEA (Root mean square Sustainability 2018, 10, 2723 7 of 15 error of approximation) is less than 0.08. GFI (Goodness-of-fit index), AGFI (Adjusted goodness-of-fit index), NFI (Normed fit index), TLI (Tucker-Lewis index), and CFI (Comparative fit index) values are more than 0.9. All the fitting indexes have reached the fitting standard. Therefore, the construction of the index system is reasonable.

Characteristic Analysis of the Sample
In this survey, the proportion of males and females of the respondents is 51:49, which is close to 1:1. For age ranges, most respondents (more than 72%) are 18-44. Overall, 1869 respondents earn 2000-6000 yuan/month, a response rate of 62%. The travel purpose is mainly going home and working, which accounts for more than 54% of the total respondents. This is consistent with the characteristics that the number of commuter accounts for 57% of the total travel number of Beijing residents [27]. Considering the different scale of passenger volume among different types of bus lines, approximately 96% of the respondents are normal buses passengers. In the valid sample, 38% of the respondents were selected in the peak period of workday. Overall, 35.7% of sampled people were selected during non-peak period of workday and 25.9% during the weekend. The distribution of the survey sample is shown in Table 4.

Satisfaction Evaluation Method
To reflect the real satisfaction of passengers, this research used the correlation coefficients as weights and determined the overall satisfaction or the satisfaction of each level indicators layer by layer, rather than directly using the marks of the passengers on the overall satisfaction to evaluate satisfaction. This practice is widely used in customer satisfaction research in a wide range of industries worldwide [28].
The questionnaire marks the scores of overall satisfaction as well as the satisfaction of 21 indicators at second-level. The weights are the correlation coefficient (ρ) of every second-level indicator with the overall satisfaction. The increase of the correlation coefficient would improve the correlation between indicators and overall satisfaction. Based on the number of respondents in the different research groups G which can be divided into several groups according to the total, travel purpose, age or other attributes (as shown in Table 4), the correlation coefficient and satisfaction score is computed for each research group. Finally, the average score of the respondents' satisfaction is determined as each indicator of satisfaction of the research group.
For example, the correlation coefficient of each indicator at second-level with the overall satisfaction is where ρ A,Cj denotes correlation coefficient of jth indicator at second-level with the overall satisfaction; Cj i denotes score of jth indicator at second-level graded by ith respondent; Cj is average score of the jth indicator at second-level; A i is overall satisfaction score of ith respondent; A is average score of the overall satisfaction in group G; and n denotes the number of respondents in group G. Satisfaction score of kth indicator at first-level is calculated using the correlation coefficient values as given in Equation (3).
where B k denotes satisfaction score of jth indicator at first-level and m is the number of indicators at second-level corresponding to jth indicator at first-level. The satisfaction evaluation model weighted by the related coefficient reflects the users' personal factors that affect their satisfaction. To a certain extent, it overcomes the one-sidedness of the existing evaluation methods, which rely solely on "statement" to get satisfaction scores.

Overall Satisfaction Evaluation Result Analysis
This research used Equations (2) and (3) to deduce and analyze passengers' overall satisfaction with public transport in Beijing. Centesimal system quantized the results and divided them into five levels. The percentages of different levels in the total sample are shown in Figure 3. The overall satisfaction score of the passengers is 78.2. Passengers are "basically satisfied" and "satisfied" with the bus service accounted for nearly 70% of the respondents. The satisfaction of each first-level indicator is shown in Figure 4. The satisfaction score of timeliness is 74.3, which is the lowest score among six first-level indicators. Compared with overall satisfaction, timeliness score is lower by 5%. Tracing to the corresponding second-level indicators, passengers are very dissatisfied with waiting time at the bus stop and travel time. The score of waiting time at the bus stop and travel time is 71.9 and 73.3, respectively. The score of security, convenience and comfort are close to the overall satisfaction: 78.6, 78.7 and 78.6, respectively. Respondents are satisfied with service quality of traffic security, emergency management and travel convenience. All of the scores of the aspects mentioned above are over 80.0. The research also found that the satisfaction score of reliability is 80.4, which is the highest evaluation among the six first-level indicators. Respondents are very satisfied with the reliability of driver and conductor in service and bus service information. The satisfaction scores of these items are 82.1 and 81.7, respectively. Passengers also highly value the economy of bus service and the score of economy is 79.9. Bus fare in Beijing is very cheap and there are various preferential policies for students, disabled person, elderly people and so on.   The satisfaction of each first-level indicator is shown in Figure 4. The satisfaction score of timeliness is 74.3, which is the lowest score among six first-level indicators. Compared with overall satisfaction, timeliness score is lower by 5%. Tracing to the corresponding second-level indicators, passengers are very dissatisfied with waiting time at the bus stop and travel time. The score of waiting time at the bus stop and travel time is 71.9 and 73.3, respectively. The score of security, convenience and comfort are close to the overall satisfaction: 78.6, 78.7 and 78.6, respectively. Respondents are satisfied with service quality of traffic security, emergency management and travel convenience. All of the scores of the aspects mentioned above are over 80.0. The research also found that the satisfaction score of reliability is 80.4, which is the highest evaluation among the six first-level indicators. Respondents are very satisfied with the reliability of driver and conductor in service and bus service information. The satisfaction scores of these items are 82.1 and 81.7, respectively. Passengers also highly value the economy of bus service and the score of economy is 79.9. Bus fare in Beijing is very cheap and there are various preferential policies for students, disabled person, elderly people and so on. The satisfaction of each first-level indicator is shown in Figure 4. The satisfaction score of timeliness is 74.3, which is the lowest score among six first-level indicators. Compared with overall satisfaction, timeliness score is lower by 5%. Tracing to the corresponding second-level indicators, passengers are very dissatisfied with waiting time at the bus stop and travel time. The score of waiting time at the bus stop and travel time is 71.9 and 73.3, respectively. The score of security, convenience and comfort are close to the overall satisfaction: 78.6, 78.7 and 78.6, respectively. Respondents are satisfied with service quality of traffic security, emergency management and travel convenience. All of the scores of the aspects mentioned above are over 80.0. The research also found that the satisfaction score of reliability is 80.4, which is the highest evaluation among the six first-level indicators. Respondents are very satisfied with the reliability of driver and conductor in service and bus service information. The satisfaction scores of these items are 82.1 and 81.7, respectively. Passengers also highly value the economy of bus service and the score of economy is 79.9. Bus fare in Beijing is very cheap and there are various preferential policies for students, disabled person, elderly people and so on.

Bus Satisfaction Analysis in the View of Different Segments
The satisfaction of the Articulated bus is the lowest (77.7) among the vehicle types, due to the score of timeliness below 4.7% of the average. We also found that the satisfaction levels of the different types of line are related to their functional localization. The satisfaction of the Customized bus is the highest (87.1) because of meeting the actual demand of passengers. The degree of satisfaction of the Common line and Branch line is relatively low. There is great difference in the scores of satisfaction among different types of bus stops. The satisfaction in hub stops is the lowest, which is related to the high density of passenger flow. Accordingly, overall satisfaction is heavily influenced by poor feelings for comfort aspects Compared with the customers who do not own a private vehicle, car owners have higher expectations for convenience and economy. Improving the services capabilities of these two aspects is a feasible way to attract more private car owners to take the bus. For the travel purpose, the satisfaction of the commuting travel is relatively high, and is ranked in the top three. The satisfaction of traveling for personal affairs is the lowest because respondents have high expectations on the timeliness of bus service. For the income group, the bus service quality satisfaction of the income group 2000-4000 is the highest, while the satisfaction of income group 4000-6000 is the lowest. For the age group, the satisfaction appears to show high polarization. The satisfaction of students is the lowest (77.2), especially with respect to timeliness and convenience, which are the most important indicators to improve. The age group 45-54 has high satisfaction on the bus service, and they expressed a high degree of recognition for services in the economy. The specific scores of satisfaction are depicted in Table 5.

Variance Analysis of Influencing Factors
Severl factors influence satisfaction. Thus, it is necessary to set the factors that may influence bus passenger satisfaction in the questionnaire, which mainly include bus and line factors, such as the type of bus, line, and stops; and individual attribute factors, such as travel purpose, income, age and time. The research used variance analysis to investigate the influences of various factors on the results of satisfaction. Then, it identified the factors that have a more significant impact with a low satisfaction score to lay the foundation for putting forward improvement measures on the bus service. Multivariate variance analysis is a statistical analysis with multiple independent variables to determine whether they are affected by one or more factors. To analyze the influence of different attribute factors on the first-level indicator satisfaction, this research conducted a multivariate variance analysis and the result is given in Table 6.
As seen from the significance, different influencing factors have different impacts on the indicators: (1) The timeliness indicator is mainly constrained by the factors of the customer's age, travel purpose, and time. Age plays the most significant role.

Conclusions and Future Work
Sustainable development of bus vitality is an important part of urban green transportation. Improving public transport service quality is an effective way to enhance bus attractiveness. This research constructed a satisfaction evaluation matrix that includes 6 first-level indicators, 21 second-level indicators and 77 third-level indicators from the opinion of passengers. By the survey design, stratified sampling was used to select 100 bus lines as samples to evaluate the satisfaction of bus service quality in Beijing. The results show that overall satisfaction score is 78.2 and the proportion of bus passengers who are satisfied with the bus service exceeds 80%. By analyzing the different segment's satisfaction evaluation and influencing factors of satisfaction, the study found that the bus type is the primary factor that affects the overall satisfaction. The satisfaction with the Articulated bus is very low because it always undertakes large-scale passenger flow. The lowest satisfaction score of first-level indicators is timeliness, which is mainly affected by the factor of age. The satisfaction of the income group 2000-4000 and passengers from hub stops is also relatively low. Therefore, enhancing the timeliness of the bus service is the first effective measure to improve the overall satisfaction. The service level of security, convenience, comfort and passengers' travel environment should be developed step by step as well.
The research provided method guidance for quantitative bus satisfaction evaluations. Firstly, a multi-dimensional evaluation index system contributed to evaluating the service quality of public transport scientifically and objectively. The evaluation indexes covered the whole trip of passengers, from the starting point to the destination. Secondly, the process of survey design was also sufficiently detailed. Because of the complexity of traffic in megacities, the rigor of the survey scheme played an essential role in ensuring the evaluation of the results is accurate. Thirdly, the research carried out a normalized assessment of the service level and satisfaction for public transportation. The main advantage of using the satisfaction evaluation model weighted by the related coefficient is that it not only considers the subjective feelings of the users but also estimates the importance of the indicators based on the passenger perception. Under the balance of "statement" and "estimate", the overall satisfaction and the satisfaction of the first-level indexes are determined.
In future research, normalization assessment could be evaluated using satisfaction evaluation index system in more cities. The difference in service quality level between cities can be further compared with each other. It might also be worthwhile to investigate the relationship between the passengers' individual characteristics and service quality perception. That is conducive to provide a more scientific method of service performance evaluation and identify indicators need to be optimized.
Author Contributions: J.W., J.W. and L.M. designed the overall framework of the research. X.D. and C.W. is responsible for the data collection and analyzing the data. All authors wrote the paper, but their primary individual contributions are reflected as follows: Sections 1 and 6 are to be ascribed to J.W. and L.M.; Section 2 is to be ascribed to J.W.; and Sections 3-5 are to be ascribed to X.D. and C.W. All authors read the final manuscript and approved it for final submission.