Estimating On-Road Vehicle Fuel Economy in Africa: A Case Study Based on an Urban Transport Survey in Nairobi, Kenya

: In African cities like Nairobi, policies to improve vehicle fuel economy help to reduce greenhouse gas emissions and improve air quality, but lack of data is a major challenge. We present a methodology for estimating fuel economy in such cities. Vehicle characteristics and activity data, for both the formal ﬂeet (private cars, motorcycles, light and heavy trucks) and informal ﬂeet—minibuses ( matatus ), three-wheelers ( tuktuks ), goods vehicles ( AskforTransport ) and two-wheelers ( bodabodas )—were collected and used to estimate fuel economy. Using two empirical models, general linear modelling (GLM) and artiﬁcial neural network (ANN), the relationships between vehicle characteristics for this ﬂeet and fuel economy were analyzed for the ﬁrst time. Fuel economy for bodabodas (4.6 ± 0.4 L/100 km), tuktuks (8.7 ± 4.6 L/100 km), passenger cars (22.8 ± 3.0 L/100 km), and matatus (33.1 ± 2.5 L/100 km) was found to be 2–3 times worse than in the countries these vehicles are imported from. The GLM provided the better estimate of predicted fuel economy based on vehicle characteristics. The analysis of survey data covering a large informal urban ﬂeet helps meet the challenge of a lack of availability of vehicle data for emissions inventories. This may be useful to policy makers as emissions inventories underpin policy development to reduce emissions. different modelling approaches: calculated fuel economy, GLM and ANN.


Introduction
One approach to mitigating the impacts of air pollution on human health, and impacts of greenhouse gases (GHGs) on climate, is to reduce the growth of vehicle fuel consumption by improving fuel economy [1][2][3][4][5][6]. Since fuel economy is a good indicator of GHG emissions it has become an important metric to assess trends and allow comparisons in GHG emissions between different vehicles as well as between vehicle fleets from different world regions. It is also a key indicator by which vehicle manufacturers assess compliance with GHG emission targets. As such, making reliable assessments of fuel economy for in-use vehicle fleets is an important policy tool for helping to target emission reduction policy [6].
Globally, governments have developed and implemented fuel economy policy and standards that specifically target fuel consumption to reduce GHGs. Such policies and standards, have been implemented in four of the largest vehicle markets: USA, China, EU, and Japan [1,[6][7][8]. Policies and standards in other major global markets (Australia, Brazil, India, Mexico and South Korea)

Materials and Methods
Nairobi and the larger NMR was chosen as the site of the study as Nairobi is a typical SSA city in terms of socioeconomic status, size and population growth [50]. Figure 1 describes the data combinations required to develop the NMR vehicle fleet dataset and how this is then used to estimate fuel economy using the three different modelling approaches: calculated fuel economy, GLM and ANN. The modelling approaches used to estimate in-use fuel economy (FE) for the on-road vehicle fleet in Nairobi require data describing vehicle characteristics and vehicle activity as listed in Figure 1. Primary data were collected using a questionnaire survey (see Appendix A Figure A1). Secondary data were used to determine the total number of vehicles and fleet composition as well as to verify the fleet compositions and characteristics derived from the questionnaire survey primary data collection (i.e., vehicle characteristics: vehicle weight, engine size).

Secondary Databases
The total number of vehicles and fleet composition for vehicles in Kenya were obtained from the Kenya National Bureau of Statistics (KNBS) [51]. The composition of the vehicles in NMR were obtained from a transport feasibility surveys [52,53]. Vehicle registration data for all light duty vehicles in Kenya from 2010-2012 were obtained from a global fuel economy initiative (GFEI) between the Partnership for Clean Fuels and Vehicles (PCFV) of United Nations Environment Program (UNEP) and the Energy Regulatory Commission of Kenya (ERC) [47]. Data describing the total number of vehicles was used to determine the sample size required for the questionnaire survey. The NMR fleet composition was used to determine the sample weighting of the different vehicle categories for the field survey.
Energies 2019, 12, x FOR PEER REVIEW 4 of 28 The NMR fleet composition was used to determine the sample weighting of the different vehicle categories for the field survey.

Questionnaire Survey
A questionnaire-based quantitative vehicle fleet survey was developed to collect data for the 18 variables describing vehicle characteristics and vehicle activity and trialled in Nairobi (see Table 1). These variables provided information on fleet composition, fuel consumption, technology, age of the vehicle, VKT, occupancy, and passenger load from data gathered from pedestrians and drivers. The face-to-face questionnaire survey interviews were conducted from December 2014 to January 2015. Interviews were conducted by two trained interviewers between 10:00-17:00 h at 15 sites across NMR. These sites were selected for their high vehicle density and pedestrian populations

Questionnaire Survey
A questionnaire-based quantitative vehicle fleet survey was developed to collect data for the 18 variables describing vehicle characteristics and vehicle activity and trialled in Nairobi (see Table 1). These variables provided information on fleet composition, fuel consumption, technology, age of the vehicle, VKT, occupancy, and passenger load from data gathered from pedestrians and drivers. The face-to-face questionnaire survey interviews were conducted from December 2014 to January 2015. Interviews were conducted by two trained interviewers between 10:00-17:00 h at 15 sites across NMR. These sites were selected for their high vehicle density and pedestrian populations and included sites in parking lots, shopping centres, markets, matatu stops, matatu and bus terminals, city centre, and residential areas. The location of the NMR field sites is shown in Figure 2. To ensure the survey responses were as representative as possible, sites were also selected to include high, medium and low-income groups; with a stratified sample of vehicle users from different socio-economic classes being interviewed as they arrived randomly. The stratification on socio-economic basis ensured representatives of vehicle characteristics, car ownership and vehicle activity as affluent neighbourhoods have been shown to have more expensive, bigger engine size cars, shorter mileage and less affluent neighbourhoods have less expensive, smaller engine size, higher mileage cars [54].
Energies 2019, 12, x FOR PEER REVIEW 5 of 28 and included sites in parking lots, shopping centres, markets, matatu stops, matatu and bus terminals, city centre, and residential areas. The location of the NMR field sites is shown in Figure 2. To ensure the survey responses were as representative as possible, sites were also selected to include high, medium and low-income groups; with a stratified sample of vehicle users from different socioeconomic classes being interviewed as they arrived randomly. The stratification on socio-economic basis ensured representatives of vehicle characteristics, car ownership and vehicle activity as affluent neighbourhoods have been shown to have more expensive, bigger engine size cars, shorter mileage and less affluent neighbourhoods have less expensive, smaller engine size, higher mileage cars [54].

Figure 2.
A map of the 15 field sites where the questionnaire survey interviews were conducted in the NMR. The map was created using GRASS software [55].
The secondary data describing the population of registered cars in Kenya [51] was used to estimate that 67% of vehicles are located in the NMR [56], this amounts to 1.35 million vehicles. Following the procedure [57] a target sample size of n = 1284 for the questionnaire survey was required to obtain a 95% confidence interval with a ±5% margin of error assuming a conservative estimate of mail survey response rate of 30% [58]. Out of the 836 persons invited to participate in the survey, 824 responded (98.6% response rate), this surpassed the response rate and the sample size was deemed to be sufficient. Table 1 summarises the 18 data variables the survey was designed to collect, divided into continuous data (with numerical specifications) and categorical data (with qualitative attributes). The questionnaire response was split by vehicle types as follows: passenger cars comprising private cars, company cars and taxis (243), matatus (250), bodabodas (233), motorcycles for personal use (11), tuktuks (16), light goods vehicles (58), and heavy goods vehicles (13). The descriptions of these vehicle types are found in Table 2. The secondary data describing the population of registered cars in Kenya [51] was used to estimate that 67% of vehicles are located in the NMR [56], this amounts to 1.35 million vehicles. Following the procedure [57] a target sample size of n = 1284 for the questionnaire survey was required to obtain a 95% confidence interval with a ±5% margin of error assuming a conservative estimate of mail survey response rate of 30% [58]. Out of the 836 persons invited to participate in the survey, 824 responded (98.6% response rate), this surpassed the response rate and the sample size was deemed to be sufficient. Table 1 summarises the 18 data variables the survey was designed to collect, divided into continuous data (with numerical specifications) and categorical data (with qualitative attributes). The questionnaire response was split by vehicle types as follows: passenger cars comprising private cars, company cars and taxis (243), matatus (250), bodabodas (233), motorcycles for personal use (11), tuktuks (16), light goods vehicles (58), and heavy goods vehicles (13). The descriptions of these vehicle types are found in Table 2.

Verification of Vehicle Characteristics
Secondary data from various second-hand sales websites [59][60][61][62] and information from vehicle manufacturers [63][64][65][66][67] were used to verify and adjust: weight, engine size and year of manufacture for the vehicles in the survey sample. The questionnaire responses relating to the manufacturer and model type were adjusted according to the information available on the manufacturers' and second-hand sales websites, to reduce inconsistencies in the data. For instance, certain vehicle makes and models are manufactured for a specific year or period and these websites have the vehicle specifications for the vehicles on sale such as weight, engine size, transmission, these data were used to ensure survey responses were correct for those categories that could be verified.

Statistical Descriptive Analysis by Vehicle Class
To help describe, summarize and compare the different vehicle types, the questionnaire survey data were divided into subsets split by Kenyan vehicle class. This was achieved by allocating the Kenyan vehicle classes to EU vehicle classes according to the EMEP/EEA classification [68]. These EU classes were used since EU classifications are frequently employed to categorise default emission factors in emission inventories. The use/utility of the vehicles in Kenya are typically different from the EU, for example, 8-seater passenger vans are converted to 14-seater matatus and motorcycles (bodaboda) are used for public transportation. In these instances, we kept certain unique Kenyan vehicle classes that represent the informal vehicle fleet (e.g., matatus, bodabodas, tuktuks, Askfortransport) but related these to an equivalent EU emission class.
Descriptive analyses were conducted to determine statistical parameters of the primary data from the questionnaire field survey using R software [69]. The statistical parameters: mean, median and standard error with 95% confidence interval were calculated for all numerical data.

Calculated Fuel Economy (FE ) Using Fuel Consumption and Mileage
Three variables from the descriptive analysis: average days per week a vehicle travels (days/week), average distance vehicle travels per day (km/day) and average money spent on fuel per vehicle (Ksh/month), were used to determine fuel consumption (FC) and mileage (VKT), which was in turn was used to calculate fuel economy, denoted as FE . FC (L/day) was calculated using the amount of money spent on fuel/month per vehicle using a baseline price for 15/November/2015 at the average fuel pump price of Ksh. 84.23 per litre of diesel and Ksh. 93.29 per litre of petrol assuming 30 calendar days per month [70]. FE is calculated from the fuel consumption per day (L/day) and the average distance travelled using Equations (1) and (2).

Identify and Screen for Implausible Questionnaire Survey Data
Implausible vehicle activity data were identified, screened and excluded based on data in the literature. FE for the most and least advanced internal combustion vehicle technology and fuels available in the world was used as a boundary limit [5]. This was based on the assumption that the best internal combustion technologies can only perform to a certain maximum efficiency giving an upper and lower limit for fuel economy for each vehicle. The lowest and highest fuel economy baseline and cut off was set for passenger and goods vehicles at 5 L/100 km and 100 L/100 km [5]; for 2-wheelers for the best and poorest fuel economy to be greater than 1 L/100 km and less than 10 L/100 km [71]. Using these criteria, 19 vehicles whose estimated fuel economy fell outside these acceptable ranges were identified and excluded from the passenger car and 2-wheeler categories. Detailed data of the excluded vehicles is shown in Appendix A Table A1.

Predicted Fuel Economy (FE") Modelled Using a General Linear Model (GLM) and Artificial Neural Network (ANN)
The methodology used for light duty vehicles in the USA [39] was built on and extended detailed as follows. Slavin et al. [39] predicted FE using a detailed historical data set of n = 6246 vehicles. Their dataset contained fuel economy data allowing evaluation of a model that estimated FE" from corresponding vehicle characteristics: engine size, engine power, torque, vehicle weight, wheel base and cross-sectional area. A least squares regression model and an ANN model was then applied to create a more accurate predictive FE" model. In the absence of fuel economy data per vehicle category in secondary data in Kenya, Equations (1) and (2) were used together with primary data from the questionnaire to calculate FE . ANN and GLM was then applied to create a model that is capable of more accurate prediction of FE according to vehicle characteristics.
Our vehicle fleet questionnaire data collected in NMR was dissimilar in that it was for the entire fleet, a smaller data set n = 824 and it missed some of the vehicle physical parameters unlike a dataset from vehicle manufacturer such as the case with the CAFÉ standards [72]. These data collected in NMR (shown in Table 1) included vehicle characteristics and activity data for in-use fleet: light duty vehicles, heavy duty vehicles, two-wheelers and three-wheelers. Given the differences in data, the Slavin et al. [39] methodology was altered to first calculate fuel economy using Equations (1) and (2) and then a GLM used to create a predictive fuel economy model [49]. The accuracy of the GLM model was compared to ANN model.
The equation relating fuel economy in Slavin et al. [39] to vehicle physical parameters was adjusted to incorporate 11 variables to explore variable importance in determining key drivers influencing FE"; the general relation is shown in Equation (3). Vehicle type and utility (VTU) were re-coded into three dummy variables representing three broad classes: passenger cars, 2-wheelers and 3-wheelers and light commercial vehicles. Heavy duty vehicles were used as a reference category. Fuel type (FT), transmission (TT), and condition of the vehicle when it was originally purchased (NU) were similarly recoded. In recoding the NU variable, vehicles bought new (NN) were used as a reference category. The dependent variables were then transformed using natural logarithm.
While a GLM fits only linear and direct associations between the set of predictor variables and the dependent variables, ANNs are more flexible and deal with non-linearity more accurately [73]. The final model depends on trying a range of different network configurations and comparing their predictive power, therefore the whole process depends on guarding against over-fitting, which is described in detail in the Appendix A.3. This includes a detailed description of the following processes: imputation, split to obtain evaluation dataset, GLM and ANN model, cross validation.

Vehicle Class, Type and Attributes
Using the EMEP/EEA classification [68], 16 segment Kenyan vehicle classes were developed using the sample data based on vehicle weight, engine size and utility shown in Table 2. The distribution of the questionnaire data to these broad vehicle categories is also shown in Table 2. The category that had the largest number of questionnaire returns was matatu, followed by bodaboda and then private cars comprising of 250, 233 and 194 vehicle specific questionnaire response, respectively.

Vehicle Characteristics
A portion of the descriptive statistics for the vehicle characteristics (before imputation) is shown in Figure 3. The vehicle characteristics presented are gross vehicle weight (GVW) (kg), engine size (cc) and vehicle age (years) which is determined from the year the vehicle was manufactured. These data are shown for 11 of the 16 segments defined in Table 3 since there was insufficient data from the questionnaire data for the remaining four segments; engine size and weight were also missing for some of the vehicle categories.
The oldest vehicle average age is for the type AfritypeM2 (14 seater matatus) at 16.9 ± 0.2 years, and the lowest average age is AfritypeLe (three wheeler tuktuks) at 2.2 ± 0.8 years, although AfritypeL3e (two wheeler bodabodas and private motorbikes) are also relatively new with an average age of 2.7 ± 0.4 years. Of the different vehicle classes, AfritypeM3C (33-51 seater matatus) showed the highest variability in age.
Engine size and vehicle weight are key vehicle characteristics in determining vehicle class together with the utility of the vehicle. Vehicle weight and engine size are predetermined from manufacture and grouped according to the Kenyan classes shown in Table 2. The heaviest vehicle weight and biggest engine size is for the type AfritypeM2C (33-51 seater matatus) and the least weight and engine size were the AfritypeL23e, the bodabodas and private motorbikes. Highest variability for weight was AfritypeN2 (heavy duty trucks) and for engine size was AfritypeM3C (33-51 seater matatus).
Energies 2019, 12, x FOR PEER REVIEW 9 of 28 AfritypeL3e (two wheeler bodabodas and private motorbikes) are also relatively new with an average age of 2.7 ± 0.4 years. Of the different vehicle classes, AfritypeM3C (33-51 seater matatus) showed the highest variability in age. Engine size and vehicle weight are key vehicle characteristics in determining vehicle class together with the utility of the vehicle. Vehicle weight and engine size are predetermined from manufacture and grouped according to the Kenyan classes shown in Table 2. The heaviest vehicle weight and biggest engine size is for the type AfritypeM2C (33-51 seater matatus) and the least weight and engine size were the AfritypeL23e, the bodabodas and private motorbikes. Highest variability for weight was AfritypeN2 (heavy duty trucks) and for engine size was AfritypeM3C (33-51 seater matatus).  Figure 3. Vehicle characteristics from questionnaire data, mean with 95% confidence interval for vehicle age, engine size, and weight.

Vehicle Activity
A portion of descriptive statistics for vehicle activity is shown in Figure 4. The vehicle activities shown are daily mileage calculated as vehicle kilometres travelled (VKT) per day (km), fuel consumption per vehicle (L/day), and the fuel economy (L/100 km), for 11 of the 16 segments. The highest mean VKT (215.7 ± 60.5 km/day) and highest fuel consumption (63.2 ± 9.9 L/day) were both recorded for AfritypeM3C (33-51 seater matatu). The highest mean FE was found for AfritypeM3A (37.4 ± 5.4 L/100km), 14-26 seater matatu. The highest variability among the vehicle classes for fuel consumption and fuel economy was AfritypeN2 (heavy duty trucks) while the highest variability in VKT was found for AfritypeM3C (33-51 seater matatu).
Energies 2019, 12, x FOR PEER REVIEW 10 of 28 A portion of descriptive statistics for vehicle activity is shown in Figure 4. The vehicle activities shown are daily mileage calculated as vehicle kilometres travelled (VKT) per day (km), fuel consumption per vehicle (L/day), and the fuel economy (L/100 km), for 11 of the 16 segments. The highest mean VKT (215.7 ± 60.5 km/day) and highest fuel consumption (63.2 ± 9.9 L/day) were both recorded for AfritypeM3C (33-51 seater matatu). The highest mean FE' was found for AfritypeM3A (37.4 ± 5.4 L/100km), 14-26 seater matatu. The highest variability among the vehicle classes for fuel consumption and fuel economy was AfritypeN2 (heavy duty trucks) while the highest variability in VKT was found for AfritypeM3C (33-51 seater matatu). The differences in FE' between the vehicle classes as presented in Figure 4, were tested for statistical significance using Analysis of Variance (ANOVA). The variables compared in the test were the Afritype classification and the default classes from the questionnaires. FE' was found to be statistically highly significant p < 0.001 for N = 707, the table of results of the p values resulting from this comparison is presented in Tables A2 and A.3.

Fuel Economy Model
3.4.1. Imputation The differences in FE between the vehicle classes as presented in Figure 4, were tested for statistical significance using Analysis of Variance (ANOVA). The variables compared in the test were the Afritype classification and the default classes from the questionnaires. FE was found to be statistically highly significant p < 0.001 for N = 707, the table of results of the p values resulting from this comparison is presented in Tables A2 and A3.

Imputation
The data set before imputation is presented in Figure 5 which shows the map of missing values. The nine variables shown in columns in Figure 5 correspond with variables from Equation (3) as follows: Age, MIL, YBT, GVW, DPW, CC, TT, FT, NOS. The first three: Age, MIL and YBT have the most missing variables. Before imputation only 36% of the dataset had a value for every variable, this improved to 89% after imputation with fuel economy not being imputed (which accounted for the remaining 11%). The data set before imputation is presented in Figure 5 which shows the map of missing values. The nine variables shown in columns in Figure 5 correspond with variables from Equation (3) as follows: Age, MIL, YBT, GVW, DPW, CC, TT, FT, NOS. The first three: Age, MIL and YBT have the most missing variables. Before imputation only 36% of the dataset had a value for every variable, this improved to 89% after imputation with fuel economy not being imputed (which accounted for the remaining 11%). A plot of the diagnostics for the imputation is presented in Figure 6; the performance of the prediction algorithm of the imputation is compared with that based only on the observed data obtained from the survey. The dots in Figure 6 each represent an observed data point in the dataset and the mean imputed value that would be used in the analysis if this value had been a missing value. The x-axis orders these points according to their observed value while the y-axis presents this mean imputed value. The 90% confidence intervals around the means are based on 20 'overimputations' [48]. The line in each plot presents the line of agreement, i.e., with perfect information all points would lie on this line (equivalence of observation and imputation) and we would expect 90% of dots to show an overlapping confidence interval with that line in each panel of the figure. The colours code the fraction of the missing values on the other covariates for that specific observed value. Thus, the results in Figure 6 show that the imputation worked reasonably for most variables with Engine Size (CC) and weight (GVW) being better imputed than Days per Week (DPW), which tend to be overestimated for the relatively few respondents who use their cars on four days or less. It is also worth noting that DPW had more missing values than CC. A plot of the diagnostics for the imputation is presented in Figure 6; the performance of the prediction algorithm of the imputation is compared with that based only on the observed data obtained from the survey. The dots in Figure 6 each represent an observed data point in the dataset and the mean imputed value that would be used in the analysis if this value had been a missing value. The x-axis orders these points according to their observed value while the y-axis presents this mean imputed value. The 90% confidence intervals around the means are based on 20 'overimputations' [48]. The line in each plot presents the line of agreement, i.e., with perfect information all points would lie on this line (equivalence of observation and imputation) and we would expect 90% of dots to show an overlapping confidence interval with that line in each panel of the figure. The colours code the fraction of the missing values on the other covariates for that specific observed value. Thus, the results in Figure 6 show that the imputation worked reasonably for most variables with Engine Size (CC) and weight (GVW) being better imputed than Days per Week (DPW), which tend to be overestimated for the relatively few respondents who use their cars on four days or less. It is also worth noting that DPW had more missing values than CC.

ANN Exploratory Phase
A range of different ANN model configurations was explored in the training data set (a random 75% split of the data). The networks were confined to two layers because increasing the number of layers or the number of neurons did not improve the information criteria or mean square error (MSE) values. The top panel of Figure 7 depicts AIC and BIC values for the tested two-layer architecture, lower values indicating better fit. As the number of nodes in the first and second layer decreased, the AIC and BIC numbers decreased. The minimal value was reached for both criteria at a NN4.1, indicating that this was the model with the lowest number of parameters while showing the highest likelihood based on the test data. Comparing the MSE values of the ANN and GLM model, the GLM model generally performed better.
The ANN models to be tested in the validation step were determined to be NN4.1 (lowest AIC, BIC and MSE in test data), NN4 (testing whether the layer with one node is needed) and NN3.1 (testing whether four nodes are needed). Figure 7 also shows the predictions made based on the GLM and the NN4.1 in the test data (random complementary 25% split of the data set). As the figure shows, both models identified the general distribution of the observed fuel economy data fairly well. This is also mirrored by the correlations between the calculated fuel economy (observed data) and the predicted fuel economy values from the GLM (r = 0.77, p < 0.001), the respective correlation between observed and predicted for the ANN (r = 0.73, p < 0.001) and finally the correlation between the predicted values from both models (r = 0.92, p < 0.001).

ANN Exploratory Phase
A range of different ANN model configurations was explored in the training data set (a random 75% split of the data). The networks were confined to two layers because increasing the number of layers or the number of neurons did not improve the information criteria or mean square error (MSE) values. The top panel of Figure 7 depicts AIC and BIC values for the tested two-layer architecture, lower values indicating better fit. As the number of nodes in the first and second layer decreased, the AIC and BIC numbers decreased. The minimal value was reached for both criteria at a NN4.1, indicating that this was the model with the lowest number of parameters while showing the highest likelihood based on the test data. Comparing the MSE values of the ANN and GLM model, the GLM model generally performed better.
The ANN models to be tested in the validation step were determined to be NN4.1 (lowest AIC, BIC and MSE in test data), NN4 (testing whether the layer with one node is needed) and NN3.1 (testing whether four nodes are needed). Figure 7 also shows the predictions made based on the GLM and the NN4.1 in the test data (random complementary 25% split of the data set). As the figure shows, both models identified the general distribution of the observed fuel economy data fairly well. This is also mirrored by the correlations between the calculated fuel economy (observed data) and the predicted fuel economy values from the GLM (r = 0.77, p < 0.001), the respective correlation between observed and predicted for the ANN (r = 0.73, p < 0.001) and finally the correlation between the predicted values from both models (r = 0.92, p < 0.001).

Cross Validation
The results of the cross validation from the iterative bootstrap of all four models is shown in Figure 8. Figures 8I-IV show the difference in AIC and BIC values of the originally best fitting model (NN4.1) compared to its two closest competitors (NN4, NN3.1). Positive differences in each panel indicate that NN4.1 had a worse fit in a cross-validation run (i.e., larger values than the competitor),

Cross Validation
The results of the cross validation from the iterative bootstrap of all four models is shown in Figure 8. Figure 8I-IV show the difference in AIC and BIC values of the originally best fitting model (NN4.1) compared to its two closest competitors (NN4, NN3.1). Positive differences in each panel indicate that NN4.1 had a worse fit in a cross-validation run (i.e., larger values than the competitor), negative differences indicate evidence against the competitor model. We can see that for both information criteria and both comparison models the overwhelming majority of differences indicates that the simpler model shows a better fit to the data than NN4. 1   V-VII of Figure 8 shows the difference in MSE values between the GLM predictions in training/test data splits and the three network models. Negative differences indicating that the GLM was performing better than an ANN (larger MSE for ANN and vice versa for negative ones). The GLM consistently performed better than ANN for all the models as the difference between MSE GLM values and ANN MSE values was again negative for the overwhelming majority validation runs (NN4.1 worse MSE in 99.0%; NN4 in 99.1%; NN3.1 in 98.3% of cross validation runs).

Interpretation of the GLM
Fitting the GLM to the whole data set results in a significant omnibus test statistic (deviance = 376.42, df = 15, p < 0.001), indicating that the chosen predictors together inform fuel economy statements given by the respondents. Table 3 presents the estimated coefficients. Engine size is the only coefficient that is deemed significant based on the conventional nominal alpha level of p < 0.05: per standard deviation increase in engine size, the fuel consumption of a vehicle is increased by 0.48 standard deviations of L/100 km. Three variables showed marginally significant relationships with fuel consumption, which were the weight of the vehicle (GVW), whether the vehicle was bought in V-VII of Figure 8 shows the difference in MSE values between the GLM predictions in training/test data splits and the three network models. Negative differences indicating that the GLM was performing better than an ANN (larger MSE for ANN and vice versa for negative ones). The GLM consistently performed better than ANN for all the models as the difference between MSE GLM values and ANN MSE values was again negative for the overwhelming majority validation runs (NN4.1 worse MSE in 99.0%; NN4 in 99.1%; NN3.1 in 98.3% of cross validation runs).

Interpretation of the GLM
Fitting the GLM to the whole data set results in a significant omnibus test statistic (deviance = 376.42, df = 15, p < 0.001), indicating that the chosen predictors together inform fuel economy statements given by the respondents. Table 3 presents the estimated coefficients. Engine size is the only coefficient that is deemed significant based on the conventional nominal alpha level of p < 0.05: per standard deviation increase in engine size, the fuel consumption of a vehicle is increased by 0.48 standard deviations of L/100 km. Three variables showed marginally significant relationships with fuel consumption, which were the weight of the vehicle (GVW), whether the vehicle was bought in Kenya (UK) and whether it was used overseas (UO), the latter two indicating that these cars consumed more fuel than the newly bought cars. The model reveals that CC (engine size of the vehicle) is the only significant predictor of fuel economy. The coefficient of [0.48] means that by increasing the engine size of a vehicle by one standard deviation (i.e., x cc), the fuel economy is increased by 0.48 SD (i.e., y L/100 km).
To test for collinearity variance inflation factors (VIF) were calculated and found to be between 5 and 10, showing the predictor variables CC and GVW being highly correlated with the other predictors. To explore the effect of this, both variables were in turn removed from the model. Collinearity was not resolved by dropping GVW, (VIF remained between 5 and 10), but without GVW, FE may also depend on AfritypeL2e/3e, fuel type (FT) and the state the vehicle was bought if new or old (NN), as the p-value < 0.05 (Table A4). Dropping engine size (CC) increased collinearity (VIF > 10), it emerged FE may also depend on AfritypeL2e/3e and the state the vehicle was bought if new or old (NN ; Table A5). These results indicate that there are several groups of vehicle features that are highly correlated and can be used as proxies for each other. This could be explored in future studies to increase the efficiency of which features to collect in surveys.

Discussion
This study has shown that for cities such as Nairobi, with limited or low-quality data and a large informal transport component (tuktuk, matatu, bodaboda, Askfortransport); questionnaire survey data can be reliably used to determine fuel economy of an urban fleet. A statistical test, ANOVA, comparing the calculated fuel economies among the various vehicle categories in Table A1, shows that the mean values for the chosen vehicle categories, even for the informal sector, were statistically significantly different from each other. Thus, the Afritype vehicle categories may be used as the classification for vehicle fleets with a large component of informal fleets with similar profiles.
There was however constraint due to the sample size: the total sample disaggregated to vehicle categories for heavy goods vehicles (HGVs) for example reduced the sample to N = 10 (see Table 2), affecting the level of confidence of the results in this category. This is because the trucks and lorries are kept out of the city centre and replaced with smaller trucks, hence their sample was much smaller than that for the passenger vehicles.
A distinct methodological limitation was the collinearity detected amongst the predictor variables, for example between weight of the vehicle and engine size. Removing these highly correlated variables from the model did not show improvement in the collinearity. Collinearity is on the one hand a statistical problem, since it reduces the precision with which the regression coefficients of linear models are estimated. On the other hand, this shows that several of these variables could be used as proxies for each other and high correlations help with imputation of missing values (although more complete data would be preferable in any case). This could be explored in future studies to increase the efficiency of which features to collect in surveys. However even with these limitations, we can conclude fuel economy and vehicle activity developed for formal transport in developed countries' sectors do not map the complexity of the informal sector in developing countries due to differences in vehicle types and utility of the vehicles.

Comparison across Countries
Major vehicle manufacturers (Japan, USA, EU and China) have fuel economy policies [6]. Figure 9 compares the various studies conducted to estimate vehicle fleet fuel economy compared to the current fuel economy values of this study. The Kenyan passenger cars have three times poorer/lower fuel economy compared to the Japanese, EU and Indian fleets and two times lower than the South Africa, Chinese and USA fleets. For the Kenyan light duty commercial vehicles, fuel economy was up to three times poorer compared to the Japanese fleet or targets. Fuel economy of the two-wheelers and three-wheelers of the Kenyan fleet (named bodaboda and tuktuk, respectively) were two times poorer than the corresponding Indian fleet. The matatu 14 seater was determined to be the equivalent to the Japanese small bus (a vehicle designed to carry 11 or more passengers and with GVW up to 3500 kg) and the South African minibus taxi. In this category the Japanese fleet was two times and South Africa fleet was 1.7 times more fuel economic than the matatu 14 seater. of linear models are estimated. On the other hand, this shows that several of these variables could be used as proxies for each other and high correlations help with imputation of missing values (although more complete data would be preferable in any case). This could be explored in future studies to increase the efficiency of which features to collect in surveys. However even with these limitations, we can conclude fuel economy and vehicle activity developed for formal transport in developed countries' sectors do not map the complexity of the informal sector in developing countries due to differences in vehicle types and utility of the vehicles.

Comparison across Countries
Major vehicle manufacturers (Japan, USA, EU and China) have fuel economy policies [6]. Figure  9 compares the various studies conducted to estimate vehicle fleet fuel economy compared to the current fuel economy values of this study. The Kenyan passenger cars have three times poorer/lower fuel economy compared to the Japanese, EU and Indian fleets and two times lower than the South Africa, Chinese and USA fleets. For the Kenyan light duty commercial vehicles, fuel economy was up to three times poorer compared to the Japanese fleet or targets. Fuel economy of the two-wheelers and three-wheelers of the Kenyan fleet (named bodaboda and tuktuk, respectively) were two times poorer than the corresponding Indian fleet. The matatu 14 seater was determined to be the equivalent to the Japanese small bus (a vehicle designed to carry 11 or more passengers and with GVW up to 3500 kg) and the South African minibus taxi. In this category the Japanese fleet was two times and South Africa fleet was 1.7 times more fuel economic than the matatu 14 seater.  [24], Kenya (current study), South Africa [29], China [44], Japan [74], EU [14,75], USA [75,76].
In Kenya, 90% all imported and registered light duty vehicles between 2010-2012 were from Japan and Europe [47]. Japan has very stringent fuel economy standards to meet their 2015 targets [74], yet when the Kenyan fleet is compared to the Japan in-use vehicle fleet in 2004, overall fleet fuel economy was two to three times worse. The comparison in Figure 9 is made on the assumption that Figure 9. Fuel economies for different countries from various sources: India [24], Kenya (current study), South Africa [29], China [44], Japan [74], EU [14,75], USA [75,76].
In Kenya, 90% all imported and registered light duty vehicles between 2010-2012 were from Japan and Europe [47]. Japan has very stringent fuel economy standards to meet their 2015 targets [74], yet when the Kenyan fleet is compared to the Japan in-use vehicle fleet in 2004, overall fleet fuel economy was two to three times worse. The comparison in Figure 9 is made on the assumption that other studies have similar or smaller confidence intervals. The confidence interval for the Kenyan study (see Figure 4), ranges from 7-54% with an average of 24%.
The passenger car fuel economy for USA includes light duty trucks [76], while for other countries light duty trucks were a separate category. This may contribute to the seemingly poor fleet fuel economy for passenger cars in the USA, even when the technology and fuels meet the latest equivalent current European and Japanese standards.
The light duty commercial fleet in-use in Nairobi was typically AskforTransport vans and trucks, an informal van and truck hire within the city and in residential areas. This category had the second highest age, as "retired" older vehicles are not scrapped but are repurposed. The fuel economy of this category is better than USA fuel economy for the same category, but USA fleet for this category is heavier (weight of this category in USA includes trucks up to 3800 kg, whilst the other fleets are less than 3500 kg) and bigger engines [6,76].
Bodabodas and tuktuks are mainly imported from Asia: India, Indonesia, Thailand, and China, as they are cheaper compared to European imports [30,33]. Motorcycles are used as public transport in India and Vietnam as they are in Kenya, but they have twice the average mileage compared to Kenya, 79.7 ± 4.3 km/day [24,77]. In Asian cities they have a lower daily mileage because they represent a larger share of the urban vehicle fleet, the reason being that motorcycles are often used in Asian cities to avoid congestion, for instance motorcycles represent 90% of the vehicle fleet in Hanoi [77]. Kenyan motorcycles were in this study (see Figure 3) found to be mainly 150 cc engine and 4-stroke engine compared to motorcycles in West Africa that are 50 cc engines and two stroke [33]. Given the trend in increasing numbers of motorcycles in SSA [30,33], the average daily mileage for motorcycles may also decrease. The study also highlighted high intensity vehicle usage, indicated by an average vehicle mileage, VKT, for other vehicle types such as passenger cars (61.04 ± 7.18 km/day), and matatu 151.55 ± 10.42 km/day. South Africa has a strong domestic vehicle manufacturing industry and restricts imports of second-hand cars [78] and is therefore unlike Kenya where 99% of vehicles are second-hand [47]. Their vehicles perform better than Kenya's, though reliable minibus taxi data (equivalent to matatu) is often not available. Kenyan matatu 14 seaters are old (16.9 ± 0.2 years) and are originally 9 seater vans converted into 14 seater; overloading and old age is a large component of the fleet; this likely accounts for the poorer fuel economy compared to South Africa. The bigger matatus, equivalent to urban buses, are relatively new and have a better fuel economy comparable to the Chinese fleet. However with expected vehicle technology deterioration [79] further aggravated by poor road conditions, low fuel quality and lack of inspection and maintenance (I/M) programmes this advantage in fuel economy may not be maintained.
The age of the vehicle is normally an indicator of the emission control technology and hence emissions from the vehicle [24,80]. This may hold true for countries that enforce emission compliance checks when importing vehicles and have regular I/M programs [19]. Imported vehicles with emissions control technology often have these removed or they malfunction without an enforceable I/M program [19]. The vehicle fleet average age for four wheelers is often high in Kenya: passenger cars 11.1 ± 0.57 years, matatu 8.80 ± 1.24 years. However, age may not to be a good indicator for emission technology on light duty vehicles in Kenya as a previous study [37] has shown. This is because in Lents et al. [37] the vehicles had the required technology but the fuel quality (unleaded petrol) required may not meet standards for emission reduction devices (catalytic converters) to function. Age is also not a good indicator for the technology of emission reduction on HDVs as the original equipment manufacturers (OEMs) are not responsible for the final vehicle configuration other than the powertrain, chassis and cab [81]. This is supported by the findings of this study of a significant variance in the age of HDV (75%), shown in Figure 3: AfritypeM3C and AfritypeN2 differ by 118% and 105% respectively. In Kenya most HDV, such as trucks, are imported as engine chassis and cab and built in the country for various uses: matatus, buses and heavy commercial trucks. However, the sample size for the HDVs for this study was limited, this is because HDVs (trucks and lorries) have limited geographical areas of circulation in Nairobi. Thus, the HDV variance should be viewed cautiously until further studies are conducted with a bigger sample size.
Comparing FE values from different parts of the world is rather uncertain. The studies from which data were compared in Figure 9, had both diesel and petrol vehicles of similar capacity, mass and power specifications. However, identical average properties were not possible for some countries (for example the USA) due to different categories for vehicle weight and engine size. Even when vehicles had identical properties to fleets in other parts of the world, their utility, especially those of the informal sector, were different. To overcome this challenge, developing country fleets (India, South Africa and Thailand) were sought for comparison as their fleets included an informal sector and had similarity in utility. But the informal transport sector in SSA is usually poorly organized and the industry is often deregulated unlike Asia [24,30,33]. The methods to measure FE also differed; real-world exhaust measurement were sought as these were deemed to be most accurate [74,76,82,83] but few such studies are undertaken, thus other in-use vehicle studies were also included [24,29,75]. The year the study was undertaken may also have contributed to the uncertainty as that may change the technology the vehicles may have and the fuel quality. To reduce this effect, the comparator studies were limited to years between 2010-2015. Furthermore, fuel consumption becomes extremely high under traffic congestion [17,84] which is a severe and worsening reality in Nairobi, as in most developing cities [50,[85][86][87][88]. Therefore, traffic congestion ought to be factored into FE studies although often, this is not the case [16]. However even with these limitations, we can conclude vehicle activity and thus fuel economy developed for formal transport sectors does not map the complexity of the informal sector due to different vehicle types and utility of the vehicles

Imputation
Multiple imputation of incomplete multivariate data was successfully applied to the vehicle fleet data. The diagnostics of the imputation in Figure 6 shows around 90% of the confidence intervals for the variables CC, GVW, Age, MIL, DPW, YBT, TT, FT and NOS contain the y = x line, which means that the true observed value falls within this range, and therefore the imputation was effective in predicting the missing values. The result of the imputation is a bigger data complement than if only those observations for which every variable measured were to be included. The imputation for Engine Size (CC) was a better imputation than Days per Week (DPW). Engine size of the vehicle was verifiable through second-hand vehicle websites and linked to other variables such as GVW, transmission, type of fuel and number of seats. Also, the number of times a vehicle is driven per week (DPW) may be strongly linked to variables not sought after in the questionnaire such as type of job, distance from home or work, fuel price change.
The map of the missing values in Figure 5 shows the variable Age has the most missing values, 46%. This is because during the interviews, if the driver of the vehicle was not the owner, they often did not have the vehicle logbook, thus the age of vehicle, when the vehicle was bought, engine size and weight was not verifiable on site. Secondary data from vehicle sales websites were used to verify and supplement this information where possible. A previous traffic survey in Nairobi was not able to directly ascertain the age of the vehicle and relied on odometer readings as a proxy for the age of vehicles the [36]. This is because at the time vehicle imports were restricted to new vehicles so this proxy worked, in 2015, 99% of vehicles imported are second-hand [47]. MIL, which is the odometer reading, had the second highest missing values, 29%. Drivers of bodabodas, tuktuks, matatus and taxis openly admitted to tampering with the odometers. This finding was supported by a previous study which had very low mileage from a multiple regression methodology to determine average mileage, and concluded that tampering had occurred [36]. Engine size (CC) and GVW were still verifiable via websites thus the missing values were less in the original dataset before the imputation.

Fuel Economy Model
In assessing the comparative statistics in Figure 8, the GLM model consistently performed better than ANN model, engine size was deemed to be most significant in predicting FE. We chose a cross-validation approach to guard our predictor selection approach against over-fitting [39,49,89]. The cross-validation procedure supports our analysis with regards to this goal in three ways. First, the use of information criteria (AIC, BIC) uses indices that provide a numerical summary that takes into account both the fit to the observed data as well as the number of parameters (here layers of the ANN). Unduly complex models were therefore penalised and less likely to end up in our final set of potential models (NN4, NN4.1, NN3.1). Secondly, the use of the MSE in a test sample ensures that if a model is prone to over-fitting the training dataset it will produce worse MSEs in this sample and would again be less likely to be selected. Thirdly, running this analysis as a bootstrap (incl. repeated multiple imputation of missing data adding further robustness) allows us to compare the potential for over/fitting as well as adequate fit in one go. Figure 7 shows that the overwhelming majority of the bootstrap runs actually support the fit of simpler neural networks than NN4.1 (NN3.1: AIC in 99.7% and BIC in 100% of runs; NN4: AIC 62.7% BIC 92.2%, respectively) and the MSE supported the GLM consistently (NN4.1 worse than MSE in 99.0%; NN4 in 99.1%; NN3.1 in 98.3% of cross validation runs). The model performance and prediction of the GLM achieved higher accuracy, this finding is contrary to a fuel economy study that compared regression models to ANN, ANN model achieved higher accuracy [39]. This may be because the success of the ANN relies on reliable input and output data to train the algorithm and bigger datasets are better for ANN model precision in prediction for instance Slavin et al. [39] and Alice et al. [49]. Limited and incomplete vehicle fleet data is often a challenge in SSA, so while ANN is a powerful tool in modelling complex relations and systems [39,90,91], due to the smaller dataset it was not the better predictive model when compared to GLM model.
Engine size was deemed to be most significant although three other variables also showed significant relationships with fuel economy: weight of the vehicle (GVW), whether the vehicle was bought in Kenya (UK) and whether it was used overseas (UO), the latter two indicating that these cars consumed more fuel than the newly bought cars. Thus, the study was able to identify aspects of the vehicle fleet character (especially engine size and weight of the vehicle) are key to predicting fuel economy changes, thus providing a focus on those parameters that are vital to obtain while conducting questionnaire surveys in order to derive an accurate estimate of fleet fuel economy.

Conclusions
This paper presents a novel methodology that develops a questionnaire and uses the survey data from the questionnaire to develop models to estimate in-use vehicle fleet fuel economy for cities with limited or low-quality data, and that have a large informal transport fleet, such as Nairobi. The vehicle fleets FE in NMR was determined to be 2-3 times worse compared with Japan, Europe, India and China, for example, for the Kenyan passenger vehicles to meet the Japanese fuel economy targets of 5.95 L/100km would require almost a 4-fold improvement in the Kenyan FE. FE models were presented that were based on survey questionnaire data; first data multiple imputations were successfully used to fill in missing data, then modelling performance of different ANN models were compared to a GLM model. The GLM model consistently performed better than the ANN model. Engine size was deemed to be most significant factor in predicting FE.
In cities such as Nairobi that are experiencing a rapid growth in transport emissions, predicting fuel economy changes in response to changes in vehicle characteristics and activity can help inform effective transport policies that rely on the availability of robust data and the application of sound assessment methods. A baseline measure of fuel economy for both the formal and informal vehicle fleet in NMR has now been established for 2015. This identifies the substantial contribution the informal vehicle fleet is currently making to the air pollution and GHG burden. This is particularly important given the trends in this fleet component which suggest a continued increase in size of this informal transport sector with no new regulations. Application of these methods can help identify the rise of informal transport as a particularly polluting component of the transport sector and help target fuel economy improvements in changing vehicle fleets in the future. It also identifies the need to take further action to address informal transport from an air quality management and GHG emission perspective. Furthermore, vehicle activity data presented here would improve Kenya's NDC formulation for the transport sector. Ultimately, this will aid sustainable road transport policy implementation, which will lead to a reduction in fuel consumption and improvement of FE, leading tor reductions in GHGs emissions and improvements in air quality. Funding: The APC was funded by SEI at Africa Centre through LIRA project and SEI at University of York.

Appendix A.3 Steps for Improving GLM and ANN Model Accuracy
When fitting the GLM and ANN models (see [39] and [49] for further details) the analyses needed to account for two specific problems. First, missing data needed to be dealt with in a manner that is statistically appropriate and that takes sampling variance into account. Second, we need to guard against over fitting our FE" model based on just a single sample. The following steps (a) to (f) were taken to address these problems: (a) Multiple imputation of missing data Multiple imputation of incomplete multivariate data, a well-established methodology for dealing with missing data [92][93][94] was applied to the dataset using R statistical package AMELIA [48]. Imputation has previously been applied to medical and psychiatric research [93][94][95][96]. Before the main analysis, 20 imputations were run to examine the accuracy of imputation and to check how close the imputed density distributions and bivariate distributions were to the original values.
(b) Split imputed dataset into estimation and valuation data After imputation, the dataset was randomly split into a training dataset constituting 75% of the imputed dataset and 25% of the remainder was used as a test dataset. (d) Neural network model-exploratory phase A neural network model was applied to the imputed dataset using Levenberg-Marquardt back-propagation algorithm. This was created using a neuralnet package [97] and closely followed existing methodology [49]. The architecture had one or two hidden layers with various configurations which were determined experimentally. MSE, Bayesian information criterion (BIC) and Akaike information criterion (AIC) values for each of these models were calculated to evaluate model fit (MSE: how close the predictive fuel economy values were to the calculated fuel economy values; AIC/BIC: how parsimonious the model fit was compared to the number of parameters needed to estimate the model). A selection of the top competing neural network (ANN) models based on the lowest MSE, AIC, and BIC numbers was identified to be included in the cross validation step alongside the GLM.
(e) Cross validation Cross validation was used in this step to measure the predictive performance of the models, to guard against over-fitting of the ANN, and to allow for model selection [89]. Three competing ANNs had been selected from step d) based on the lowest AIC and BIC values as well as MSEs of comparable size to the GLM. An iterative bootstrap process was then used to estimate the predictive performance of all four models [89]. At first a single imputation of the dataset was done and then the sample was randomly partitioned into a training set, 75% and a test set used as a validation sample, 25%. A GLM was then fitted to the training set and the MSE from predictions in the test set was saved. In the next step the three selected ANN structures were fit to this training data set, saving AIC and BIC values as well as their respective MSEs from their predictions in the test dataset. The cross-validation process was iterated 1000 times with missing data imputation and randomised partitioning of the train-test dataset in each of the runs. For each iteration a comparative statistical analysis on MSE, AIC and BIC numbers was carried out to confirm best model estimate, thereby producing bootstrap distributions of the model fit criteria.