Influence of User Preferences on the Revealed Utility Factor of Plug-In Hybrid Electric Vehicles

: Plug-in hybrid electric vehicles (PHEVs) are an effective intermediate vehicle technology option in the long-term transition pathway towards light-duty vehicle electrification. Their net environmental impact is evaluated using the performance metric Utility Factor (UF), which quantifies the fraction of vehicle miles traveled (VMT) on electricity. There are concerns about the gap between Environmental Protection Agency (EPA) sticker label and real-world UF due to the inability of test cycles to represent actual driving conditions and assumptions about their driving and charging differing from their actual usage patterns. Using multi-year longitudinal data from 153 PHEVs (11–53 miles all-electric range) in California, this paper systematically evaluates how observed driving and charging, energy consumption, and UF differs from sticker label expectations. Principal Components Analysis and regression model results indicated that UF of short-range PHEVs (less than 20-mile range) was lower than label expectations mainly due to higher annual VMT and high-speed driving. Long-distance travel and high-speed driving were the major reasons for the lower UF of longer-range PHEVs (at least 35-mile range) compared to label values. charging home and away locations, and increasing the frequency of home charging, improves the UF of short-range and longer-range PHEVs respectively.


Introduction
Climate change, air quality, and public health concerns have necessitated that governments across the world implement policies to promote battery electric (BEVs) and plug-in hybrid electric vehicles (PHEVs), collectively addressed as plug-in electric vehicles (PEVs). In the U.S., the transportation sector is responsible for 30% of total national greenhouse gas (GHG) emissions, and the light duty vehicle (LDV) segment alone contributed close to 60% of total transport GHGs in 2017 [1]. In the state of California, 40% of total GHGs comes from the transportation sector, and the contribution from the LDV segment was close to 70% of transport GHGs [2]. California and many other governments have implemented a suite of technology forcing mandates, performance standards for transportation fuels, GHG emissions, and incentive-based policies to increase the market penetration of PEVs [3][4][5].
Plug-in hybrid electric vehicles are often considered to be a transitional technology with the potential to expedite the shift towards BEVs [6,7]. Plug-in hybrid electric vehicles are equipped with a larger battery pack compared to conventional hybrid vehicles (HEVs) that can be charged using grid electricity, and have an internal combustion engine (ICE). PHEVs are not limited by the range anxiety and higher upfront purchase cost concerns associated with BEVs, and they combine the pure electric driving capabilities of a BEV with the fuel and energy efficiency enhancements due to engine downsizing, low or no engine idling, and regenerative braking capabilities of an HEV. This design and operational flexibility allow them to be driven in Charge Depleting (CD) or Charge Sustaining (CS) mode depending on the source of motive power. Charge depleting (CD) mode can further be categorized into CD-EV and CD-blended (CDB) modes. In the CD-EV mode of operation, the entire motive power is provided by the electric motor by discharging the energy stored in the battery and the engine is never turned on. This type of operation is often called all-electric mode or zero emission (ZE) mode because only electricity is consumed and there are no tail-pipe emissions. Depending on the powertrain configuration, road network topology, speed and acceleration characteristics, and driver behavior, the engine may turn on to partially assist the motor in meeting the total propulsion energy demand in the CD mode. This is called CDB mode of operation because both electricity and gasoline are consumed, and the motive power is provided by the electric motor and the ICE. The CD mode of operation continues until the battery is depleted, after which the PHEV is operated in the CS mode as a regular HEV with the ICE providing the entire propulsion energy demand and only gasoline is consumed. Driving in the CD mode could be entirely electric VMT (eVMT) or a combination of electricity and gasoline (gVMT) VMT, whereas CS mode comprises of only gVMT.
A critical aspect while assessing the real-world environmental performance of PHEVs depends on the eVMT in the CD mode, which has direct implications on the fuel economy and exhaust emissions. In this regard, the concept of Utility Factor (UF) of PHEVs has been developed, which represents the proportion of VMT travelled on electricity. The formal procedures and test conditions under which the UF and the Environmental Protection Agency (EPA) "sticker label fuel economy" are estimated, are outlined in Society of Automotive Engineers (SAE) J1711 [8] and SAE J2841 [9] respectively. Standardized dynamometer certification cycles [10,11] are recommended in SAE J1711 to estimate the all-electric range (AER) and per-mile energy consumption in the CD-EV (kWh/mile) mode and miles per gallon (MPG) in the CS mode. Strictly speaking, AER or the charge depleting range, is the total miles traveled by a fully charged PHEV in the CD-EV mode prior to the first engine start event. To ensure parity and representativeness across geographies and socio-demographics with varying travel needs, the per mile energy consumption is weighted against national driving statistics such as the National Household Travel Survey (NHTS) [12] in order to determine at an aggregate level how much of a vehicle's driving can be accomplished on CD mode in SAE J2841. The SAE J2841 explicitly assumes that: • the PHEV starts its travel day on a fully charged battery; • PHEV is fully charged once per day on days driven at the end of travel day trip at home; • effect of additional intra-day charging and vehicle not being charged at the end of last trip nullify each other; • travel patterns and VMT by PHEVs are identical to the self-reported single-day trip diary information of mainstream ICE users in the 2001 NHTS.
Due to its simplistic and selective set of assumptions, the J2841 may not adequately reflect how PHEVs are driven and charged in real-world conditions. Using year-long longitudinal data collected via on-board data loggers from 153 PHEVs (11-53 miles AER) in California, this paper systematically examines the disparities between observed PHEV driving, charging behavior and generalized expectations about their usage patterns, and its implications on UF estimates encapsulated in existing PEV policies.
Prior studies that relied on cross-sectional travel survey data like the NHTS broadly focused on understanding the sensitivity of UF to different assumptions about travel patterns and charging behavior. In [13] alternatives to the J2841 UF is proposed using the 2009 NHTS instead of the 2001 NHTS and a mid-day opportunistic charging, typically at the workplace, is also considered. Their study reported that the proposed UF is higher than the J2841 UF, but only for PHEVs with AER less than 65 miles. While the J2841 UF is strictly a distance based metric, Ref. [14] proposes an energy based UF. Sensitivity of UF to different vehicle attributes such as age, class, annual VMT, and  [3,28,29], zero emission vehicle(ZEV) credit allocation under the ZEV mandate [30,31], vehicle emissions and label fuel economy estimates [32,33], and California's Low Carbon Fuel Standards (LCFS) [34] rely on.
The main contributions of this study are the following: • Comparative assessment of observed PHEV driving and charging and EPA sticker label expectations and the SAE J2841 assumptions.

•
Eight dominant factors (four each for driving and charging) that explains the variations in observed PHEV usage patterns are extracted using Principal Components Analysis (PCA).

•
Ordinary Least Squares (OLS) regression models are formulated to test the explanatory power of the factors by including them as dependent variables and the independent variable is the difference between observed and expected UF.

•
Relative importance of the extracted factors in terms of their contributions to the disparities between observed and expected UF is then quantified.

•
Though dimensionality reduction using PCA and regression modeling are commonly used, their specific application in the context of real-world observational study of PHEVs and UF is a new approach that is carried out in this study.
This paper advances to the body of literature that focuses on improving our understanding of the real-world UF of PHEVs by discerning influential driving and charging traits that contributes to the deviations from sticker label UF. To the best of our knowledge, compared to existing studies which limit their scope of analysis to either aggregate or daily levels [16,21,22,24,35], this paper focuses on explaining why real-world performance deviates from label expectations by methodically examining disparities at varying time-scales (trip/charging sessions, daily, and annual); incorporates locational aspects of charging infrastructure access and utilization and how it impacts the UF; and explores if the key driving and charging factors that introduces deviations in real-world UF from their label values are the same irrespective of the AER. The outcomes of this study will offer a realistic assessment of the real-world electrification potential of PHEVs, challenges, and/or validates conventional wisdom on PHEV usage, and subsequently their energy consumption and emissions. Understanding the causes, magnitude, and direction of differences between assumptions about PHEV usage and their observed usage will help the broader scientific community in parametric updates, calibration, and validation efforts to strengthen the representativeness or correct for the lack thereof in vehicle choice modeling [36], powertrain simulation tools [37], integrated assessment studies [38], charging infrastructure planning [39], and emissions inventory [40]. We expect the paper help in formulating policies aimed to incentivize PHEVs based on road performance and also to inform automakers when exploring future vehicle design.
The assemblage of data analyzed in this paper consists of driving and charging data collected between June 2015-June 2018 from 153 PHEVs in California. Five PHEV models are examined in this study: Toyota Prius (11-mile AER), Ford CMax and Fusion Energi (20-mile AER), Chevrolet Volts (35/38 miles and 53 miles AER). The rest of the paper is organized as follows. Section 2 summarizes the aggregate driving and charging data and describes the quantitative methods used. Our analysis and results are detailed in Section 3. Comparative assessment of observed driving and charging behavior with sticker label expectations, followed by the PCA and OLS model results are presented in Section 3. We discuss our findings in Section 4 and conclude this paper in Section 5.

Data and Methods
The source of the data used in this paper is from the Advanced PEV Driving and Charging Behavior project, a multi-year study to monitor PEV usage in California [41,42]. Online survey was first administered to PEV owners randomly sampled from the California Clean Vehicle Rebate Project [43] and vehicle registration records. Sub-sample of respondents were selected and GPS enabled data loggers were installed in the on-board diagnostics (OBD) port and monitored for at least a year. The data loggers report more than two dozen variables related to driving, charging, performance, and comfort. The important vehicle usage parameters relevant to our analysis are: trip and charging session start and ending time stamps and locations; trip and charging session start and end state of charge (SOC); charger level, charging duration, and charged energy; trip distances, duration, and consumption (electricity and gasoline). Since the scope of this paper is on real-world performance, our analysis strictly focuses on the data from the loggers, and the respondent's home location is the only relevant survey information that is included in our analysis. Five PHEV models with sticker label AER varying from 11-53 miles [44] Table 1 presents the driving and charging data which consists of approximately 2 million VMT, 200,000 trips, 52,000 charging sessions, and 260 MWh of charging energy collected over the course of 45,000 driving days (driving and charging or driving only) between June 2015-June 2018 in California. On average, every vehicle in the dataset was driven 292 days, and among the PHEV types it varied between 268 and 315 days during the data collection period. Of the 52,237 charging sessions, 53% were at Level 1(L1), 1.4 kW rated and the rest were at Level 2 (L2), 3.3 kW rated [45]. Throughout the rest of the paper, unless otherwise specified, J2841 UF [9], AER and mode specific energy consumption, kWh/mile in CD-EV mode or Miles Per Gallon(MPG) in CS mode, found in EPA fuel economy labelling data [46] are collectively addressed as Expectations. The label UF refers to the EPA sticker label city/highway combined UF. Corresponding values estimated from the data analyzed in this study as Observed. Table 2 presents the average annualized and daily estimates of key PHEV driving and charging metrics. Principal Component Analysis (PCA) is a statistical procedure to reduce the dimensionality of the dataset and it falls under the larger umbrella of Exploratory Factor Analysis (EFA). In this paper, we use PCA to reduce the dimensionality of the observed driving and charging data which is highly susceptible to the problem of multicollinearity. Since Utility Factor is the ratio of eVMT to VMT and eVMT is a function of charging behavior, multicollinearity would persist and as such could severely undermine interpreting the statistical significance of driving and/or charging related independent variables (IV) on the dependent variable (DV). Combining both driving and charging related usage metrics and then performing the PCA may not eradicate the problem of multicollinearity. A correlation that existed in the higher dimensional space of the original data merely gets transformed and projected onto a new and lower-dimension space. This could complicate factor definition, number of factors to retain, determining the minimum loading criteria, and the rotation method to choose. To address these issues, we performed PCA of driving and charging related variables separately.

Validity and Suitability of the Data for PCA
The EFA involves four major steps. The first step is to check the appropriateness of the dataset for factor analysis. For this, we used the Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy (MSA) [47,48] and Bartlett's Test for Sphericity [49]. The KMO MSA is an index between 0-1 which quantifies the ratio of observed correlations to partial correlations, the higher the better indication of the suitability of PCA [50,51]. Literature recommends a empirical rule of thumb of at least 0.6 as a minimum for the KMO MSA [52,53]. Bartlett's Test for Sphericity tests the hypotheses that the correlation matrix is an identify matrix, thereby implying that variables are unrelated and not suitable for PCA. Bartlett's test with a p-value of less than 0.05 is required for PCA. The KMO and Bartlett's tests together determine whether the underlying structure of the dataset is suitable before proceeding to perform the PCA. The second step is to decide how many factors to retain. The number of factors to extract and retain is typically determined based on the share of variance explained by each of the factors and a suitable threshold for the cumulative total variance captured by the PCA. Scree plot for Eigen Value of greater than one is a widely used method to select the number of factors to retain, which this paper employed. The resultant component matrix shows the factor loadings or correlation between variables used for PCA (row-wise) and the factors (column-wise). Factor loadings outside the interval of ±0.3 are typically omitted [53,54]. The third step is to rotate the component matrix to simplify their structure and facilitate their interpretation. There are two major categories of factor rotation, Orthogonal and Oblique [55]. In orthogonal rotation, the factors are rotated by 90° to make them un-correlated, whereas in Oblique rotation, correlation between extracted factors are permissible. Varimax and Quartimax are the commonly used orthogonal rotation methods. The Quartimax method minimizes the number of factors needed to explain each variable used in the PCA and the Varimax method of rotation causes each variable to load heavily on one factor [56][57][58]. We used the Varimax method because of the simplicity of interpretation. Since the factors themselves are not correlated, Varimax rotated factors can be used in assessing the explanatory power of the factors in a regression model. The fourth and final step is to suitably name the rotated factors based on the factor loadings.
The aggregate driving and charging data of 153 PHEVs was first annualized based on the number of days every PHEV was driven. Using PCA, we identified four dominant driving and charging factors, eight in total. In this paper, the KMO MSA was close to 0.8 for both the driving and charging related PCA, which is considered "meritorious" [47,50,51]. The p-value of Bartlett's test was extremely low and lower than the significance level of 0.05 for both the driving and charging related PCA, the data is suitable for PCA. The extracted factors captured 87% of the total variance in the dataset. Variables used for PCA, extracted factors, and their definitions are detailed in Section 3.2. To test the explanatory power of the extracted factors, we built Ordinary Least Squares (OLS) multi-variate linear regression models with the extracted factors as the independent variables (IVs), and the deviation of observed UF from label UF as the dependent variable (DVs). The dataset was divided into two based on the AER: short-range PHEVs (Prius, Ford CMax/Fusion Energi) and longrange PHEVs (Volts). OLS regressions models for short-range PHEVs and long-range PHEVs were developed separately. We carried out a-priori and post-hoc hypothesis tests and validated that the sample size and power are adequate for the given significance level (5%) and sample size for both models. To supplement the insights gathered from the regression models and gauge the practical utility rather than just their statistical significance of the IV, we performed a relative importance analysis of each of these extracted factors by quantifying their main and total effects. The main effect is the contribution by an IV to the total variance by itself, and the total effect is the contribution by an IV to the total variance in combination with other IVs [59][60][61]. We also examined the effect of including interaction terms in the regression models. The regression model estimates and outcomes of the relative importance analysis are detailed in Section 3.3. The PCA was done using IBM SPSS and the OLS regression modeling, and relative importance analysis were carried out using JMP Pro 15.3.

Analysis and Results
In this section, we compare real-world performance of observed PHEVs with sticker label expectations from the perspectives of UF, daily driving distances and style, mode specific energy consumption, and charging behavior. Wherever applicable, we also contrast driving and charging behavior observed among the five PHEV models studied in this paper.

Descriptive Comparisons
3.1.1. Observed UF and Expected UF Figure 1 depicts the UF distribution of every PHEV observed in this study. The EPA label expected city/highway combined UF is shown inset in Figure 1 as well. Except for the Volt-35/38, all the other PHEV models performed below EPA expectations, and the deviations were most notable in the case of short-range PHEVs (AER 20 miles or less) compared to the longer-range PHEVs (35 miles or more AER). On average, the observed UF was anywhere between 60-103% of label UF. Figure 2 shows the ratio of observed UF to label UF and its distribution by PHEV type. The observed UF of 82% (N = 18) of the Prius, 75% (N = 18) of Fusion, 66% (N = 24) of Volt-53, 54% (N = 15) of CMax, and 44% of Volt-35/38 (N = 19) were lower than the label UF estimates. Two interesting observations can be gleaned from Figure 1 and Figure 2. First, the range of UF deviations is higher for shorter-range PHEVs (Prius, CMax, and Fusion) compared to longer-range PHEVs (Volts); secondly, there are few short-range PHEVs that rarely or never plug in and are operated as a regular HEV. Referring to Table  2, , the annual VMT of PHEVs observed in this study is higher than estimates reported in other realworld PHEV usage studies [3,22,23,62]. From we can except for the Volt-35/38, the UF of PHEVs observed in this study (Figure 1 and Figure 12) was lower than the values reported in [3,22,23,62] .   [10,29,63]. The Federal Test Procedure (FTP) is an extension of the UDDS which consists of the UDDS, followed by the first 505 seconds of the UDDS. The US06 is a high speed and acceleration aggressive highway driving cycle used by the California Air Resources Board (CARB) to determine additional credit allocation under its ZEV mandate [64]. Driving style within the context of this study refers to attributes such as stop frequency per mile, percentage share of driving time, and distance driven at different speeds.    Figure 3 that shows that the test cycles do not adequately capture how the driving style varies among PHEVs with different AER capabilities. Moreover, the test cycles either underestimate or completely exclude the share of driving at highway speeds (60 mph or more). Figure 4 shows the distribution of stops per mile at a trip level. Figure 4a shows the cumulative distribution and Figure 4b shows the probability density function. For reference, stops per mile of HWFET, UDDS, and FTP is also indicated in Figure 4. It is interesting to note that the share of trips made with stop frequency lower than UDDS stop frequency per mile increases with AER, and varied from 60% for the Volt-53, to 75% for the Prius.  (a) (b) Figure 5a shows the percentage share of total distance driven at different speed intervals among the observed PHEVs and Figure 5b shows the percentage share of total distance driven at different speed intervals under EPA test cycles. Approximately 40% of total distance was driven at highway  Since high-speed driving consumes more energy (gasoline and/or electricity) compared to city or stop-and-go driving, the effective AER, fuel economy, and UF realized on-road by a fully charged PHEV could be lower than their respective sticker label estimates. This is highlighted in Table 3 and Figure 6. Table 3 compares the label and observed CS mode fuel economy in miles per gallon (MPG) and CD-EV or ZE mode per mile electricity consumption (kWh/mile) and Figure 6 shows their respective distributions of the CS mode MPG and CD-EV mode kWh/mile observed. On average, the observed CS mode fuel economy and ZE mode electricity consumption per mile was lower than the sticker label values for the Prius and Volts (Volt35/38 and Volt-53). In the case of CMax and Fusion, their CS mode fuel economy was slightly higher than sticker label values but their CD-EV mode kWh/mile was lower than the label values. From Table 3, we can infer that the disparities between label and observed ZE mode kWh/mile translates into the effective AER realized on-road being 3%-18% lower than of label AER. In the proceeding sub-sections, we specifically focus on driving and charging varied among the five PHEV models analyzed in this paper.   Figure 7 shows the percentage share of total VMT categorized by distance and type of day (weekday or weekend). Using the criteria of daily VMT of 50 miles or more to define long-distance travel (LDT) [65], the share of LDT (50 miles or more) was highest for the Fusion (37%) and lowest for the Volt-53 (21%). Overall, daily travel of 50-100 miles contributed the most (24-27%) to the share of total VMT (weekday and weekend combined) for the Prius, Fusion, and Volt35/38. In the case of CMax and Volt53, daily travel of 5-20 miles contributed the most to the share of total VMT (28%). Referring to Figure 7, it is interesting to note that on weekends, all the five PHEV models had similar or comparable share of travel across all distance bins. If we examine the weekday travel by distance bins, daily travel of 50-100 miles still contributed the most (19-23%) to the share of weekday VMT for the Prius, Fusion, and Volt35/38. A relatively shorter driving distance of 5-20 miles dominated the share (20%) of weekday travel for the CMax and in the case of Volt-53, 20-35 miles of daily travel contributed the most (21%) to weekday VMT.  The cumulative effect of travel distance preferences depending on type of day is reflected in Table 4, which summarizes the daily VMT distribution by type of day. The average weekday VMT was higher than the average weekend VMT for all PHEV models expect for the CMax and Volt-53, which were driven roughly the same 43 miles and 39 miles, respectively, on weekdays and weekends. Fusions had the highest average daily VMT and the Volt-53 had the lowest average daily VMT, irrespective of the type of day. From Table 4, it can be seen that the AER had little or no impact on the average, median, or standard deviations of VMT of Volt-35/38 and Volt-53.  Figure 8 depicts the percentage share of driving days by number of charging sessions on weekdays and weekends. If we compare Figure 7 with the J2841 assumptions of one charging session per day, it is very clear that the differences in charging behavior are salient. The J2841 method for UF estimation ignores two situations depicted in Figure 8: (i) days when the PHEV was not charged at all; and (ii) days when the PHEV charged more than once. Only on approximately 42-47% of the driving days (weekdays and weekends combined) the PHEV charged at least once, on all other days, the observed daily charging frequency did not align with the J2841 assumptions of one charging per day. While conventional wisdom would suggest that shorter-range PHEVs (Prius, Cmax, and Fusion) will have a higher proportion of days when they charged more than once due to AER limitations, our analysis shows the counterfactual. On 44% of driving days, the Volt-35/38 charged more than once per day, whereas the CMax and Fusion charged more than once on 39% of the driving days, and the Prius charged only on 30% of the driving days. Depending upon the AER, on 13%-26% of driving days (weekdays and weekends combined), PHEV did not charge at all. The J2841 also assumes that the travel day starts with the fully charged battery. The effect of the travel day starting state of charge (SOC) of the battery on the daily VMT and eVMT is illustrated in Figure 9. Empty battery refers to SOC of 5% or low, while full battery refers to SOC of 95% or more. We can observe from Figure 9 that there are three additional situations that the J2841 does not address or adequately capture: (i) PHEV driven as conventional HEV when the travel day starts with an empty battery; (ii) possibility that the PHEV might charge away from home; and (iii) possibility for intra-day charging outside of overnight parked at home time windows, typically mid-day at a workplace or any other non-home location. Figure 9 reveals that on average, PHEVs drive longer when travel day starts with an empty battery compared to travel days starting with a fully-charged battery. Except the Volt-53, all other PHEVs drive on average more than 50 miles when starting their travel day on an empty battery. We can also see that the average eVMT of Volt-35/38 and Volt-53 are almost similar on days when travel starts with a fully-charged battery.

Principle Components Analysis(PCA) of Driving and Charging Behavior
The goal of PCA is to deduce the most important driving and charging traits that significantly impact the UF and thereby its deviation from label UF. Since UF is the ratio of eVMT to VMT and eVMT is intricately linked to charging behavior, we perform PCA on driving and charging separately. In order to adequately represent important VMT indicators such as annual mileage, driving style (highway or stop and go city dominant), and long-distance travel needs, the following variables were used: i.
Share of annual VMT at 55 mph or faster (%) iii.
Long-distance travel (LDT) 100 miles or more share of annual VMT (%) iv.
Daily VMT 50 miles or less share of annual VMT (%) v.
Average number of stops per mile Table 5 summarizes the PC loadings, Eigen values, and the cumulative percentage of variance captured. The criteria to evaluate the suitability of data structure for PCA are indicated in Table 5. KMO MSA index of 0.8 is considered as "meritorious" [47,50,51], and KMO MSA of 0.711 yields reliable factors [66]. The KMO MSA was 0.799 and the p-value of Bartlett's test was extremely low, and lower than the significance level of 0.05, so the data is suitable for PCA. Using the Scree test for Eigen values greater than 1 [67], four factors were retained which capture 88% of total variance, and each factor roughly captures a similar proportion of variance. Table 6 summarizes the Varimax rotated factor loadings. The KMO MSA of the individual driving related variables was at least 0.74. For notational convenience, the four factors extracted are named with the suffix. Drv in Table 6. Based on the relative magnitude and direction of loading, the underlying factors can be described as follows. Loading of annual VMT is highest on PC1.Drv, and it represents the high usage intensity. The loading of long-distance travel 100 miles or more share of annual VMT on PC2.Drv is highest, whereas the loading of daily VMT 50 miles or less share of annual VMT is significant, but negative on PC2.Drv. PC2.Drv thus represents driving behavior characterized by strong preferences for long-distance travel. The variable with the highest loading on PC3.Drv is the average stops per mile. Annual VMT and share of VMT at high speeds loads negatively on PC3.Drv, but is not significant, and conservative driving style is captured by PC3.Drv. Since the share of VMT at 55 mph or more loads heavily on PC4.Drv, it is concerned with the energy intensity of driving and inclination for high-speed driving. Charger accessibility and utilization are key indicators of charging behavior and subsequently eVMT. In order to uncover these, the following variables were selected for the PCA of charging behavior: Home: Charging duration (minutes) viii.

Number of days vehicle charged both at home and away locations
Where home-based refers to locations that are less than a mile from home and away refers to all non-home locations. Charged energy, number of sessions, and duration include both L1 and L2 charging. Table 7 summarizes the PC loadings, Eigen values, and the cumulative percentage of variance captured by the PCA of charging behavior. Similar to the PCA of driving behavior, the suitability of data for PCA of charging behavior was validated. The Scree test for Eigen value criterion one was used, and four factors were retained which capture 87% of total variance. Table 8 summarizes the Varimax rotated factor loadings and the influential charging traits extracted by the PCA. From Table  8, we can see that all away charging related variables load positively on PC1.Chg and are significant. Likewise, loading of all home charging related variables are positive and significant on PC2.Chg. Though the loading of home charging related variables on PC3.Chg is comparable to its loading on PC2.Chg, there is an important distinction between them. The loading of home charging duration is positive, significant, and numerically greater on PC2.Chg compared to PC3.Chg. In contrast, the loading of the number of home charging sessions is positive, significant, and numerically greater on PC3.Chg compared to PC2.Chg. The higher the loading of charging duration, the longer is the charging duration, and thereby implies deep charge cycles. Likewise, a higher loading of number of charging sessions indicates higher frequency of charging. The number of days PHEV charged at both home and away locations is strongly correlated with PC4.Chg, and the loading is significant. This is indicative of enhanced charger accessibility both at home and away. The number of away charging sessions, and to an extent the number of home charging sessions, is also positively associated with PC4.Chg, though the absolute loading is only slightly below the threshold of 0.3. Based on these observations, we describe the factors based on charger accessibility, charger utilization measured in the form of charging frequency, and charger utilization measured in the form of charged duration. PC1.Chg describes frequent and deep charge cycles at away locations. PC2.Chg describes less frequent, deep charge cycles at home, PC3.Chg describes frequent, shallow charge cycles at home, and PC4.Chg is indicative of balanced utilization of charger at home and away locations.

OLS Regression Model Results
The eight retained PCs are the IVs and the DV is the difference between observed UF and EPA label city/highway combined UF (ΔUF = Observed UF-label UF). The purpose of developing regression models is to understand how well the extracted PCs can explain the difference between observed and label UF, and also identify which PCs contribute the most to the ΔUF and how it varied between short-range and longer-range PHEVs. Using the aggregate annualized dataset, we created OLS regression models for short-range (Prius, CMax, and Fusion) and longer-range PHEVs (Volt35/38 and Volt-53) separately. Statistical tests using G * Power 3 [68] was performed to verify and validate the following: • A priori: for given significance level, effect size, and power, computing the number of samples required • Post hoc: compute power achieved for the given sample size, significance level and effect size For both regression models, all the a priori and post hoc test results confirmed that our sample size was adequate to detect a large effect size based on Cohen's d, and the achieved statistical power was more than 95% [69][70][71]. These test results are summarized in Table A1 in Appendix A. To ensure consistency and parity across all the hypothesis and statistical significance tests, significance level of 5% was chosen.
Tables 9 and 10 summarizes the regression model coefficients (β) and summary of fit for the short-range and long-range PHEVs, respectively. Referring to Table 9 for the short-range PHEVs, except long-distance travel and the conservative driving style, all other factors were statistically significant. At 5% significance level, except the PC that describes less frequent and deep cycle home charging, all other charging related PCs have a statistically significance and positive impact on the UF of short-range PHEVs. All the four charging related PCs have statistically significant and positive impacts on the UF of longer-range PHEVs. Except for the PC that describes conservative driving, all other driving related PCs had a statistically significant and negative impact on the UF of longer-range PHEVs. In the case of longer-range PHEVs, except conservative driving, all other factors were statistically significant at 5%. Long-distance travel had a statistically significant impact on ΔUF of LRPHEVs, but not on short-range PHEVs. When we compare the model fit, the R 2 of LRPHEV regression model is lower than that of the SRPHEV regression model even though the LRPHEV World Electric Vehicle Journal 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/wevj model (Table 1) had a slightly higher number of observations and higher number of statistically significant factors compared to the SRPHEV regression model (Table 9). This could potentially be due to larger variations in LRPHEV usage patterns compared to SRPHEVs.  We ascertain the practical utility of the insights gathered from the regression models by carrying out relative importance analysis. Relative importance analysis quantifies the contribution of an IV to the total predictable variance by itself (main effect) and in combination with other IVs (total effect), without making any assumptions about its statistical significance [61]. This would enable comparing the relative contribution of the eight PCs and how it varied between short-range and longer-range PHEVs. For each of the IV, Monte Carlo samples using Latin Hyper Cube sampling are obtained from its initial set of observed values and the process is iterated until the standard error of the main and total effects are below a certain threshold [72]. Table 11 summarizes the main and total effect of the IVs in the across the three models. The top three predictors based on the magnitude of their main effect are also highlighted in Table 11   Table 11. Relative contribution of IVs to total predictable variance. The magnitude of the main and total effects from Table 11, when examined in conjunction with the direction (positive or negative) of coefficient estimates in Tables 9 and 10, provides a better picture of how dominant driving and charging traits impact the UF of short-range and longer-range PHEVs. In the case of short-range PHEVs, high usage intensity, high energy intensity, and home and away balanced utilization capture close to 65% of total predictable variance, Table 11. For the longer-range PHEVs, the main effect of long-distance travel accounts for 35% of total predictable variance, followed by the main effects of home-frequent and shallow cycle charging, and high usage intensity, respectively. While the UF of short-range PHEVs increases with the increase in charger access at home and away locations, for the longer-range PHEVs, encouraging more home-based charging had a positive effect of UF. For the longer-range PHEVs, increasing the frequency of charging at home has a much bigger and positive effect on UF compared to deep charging cycles or longer charging duration. This is attributable to their AER capabilities coupled with lower annual mileage and daily driving distances (Table 2). It can be inferred from Table 11 that the relatively aggressive and higher energy intensity of driving has a much bigger effect on the UF of short-range PHEVs compared to the longer-range PHEVs.

Examining the Interaction Effects
We investigated whether inclusion of interaction terms to the regression models improves their explanatory power. To avoid confounding and conflating, which would hinder the interpretation of independent variables, we specifically focused on the statistically insignificant driving and charging in Tables 9 and 10. This would clearly indicate the statistical significance of the interaction term and its effect on the overall model fit, which could not have been captured in the main effects only model (Tables 9 and 10). The variables we considered for interaction effects are Long-distance travel (PC2.Drv), Conservative driving (PC3.Drv), and Home-less frequent and deep cycle charging (PC2.Chg). Since the driving and charging PCs themselves are orthogonal due the nature of component extraction using Varimax method, and to avoid overfitting, only one interaction term (between a driving related PC and charging related PC) were considered at a time and individual regression models were developed for each of the interaction terms. In addition, we also interacted the top two predictors that contributed most to predictable variance from Table 11.
We developed the four additional regression models with main and interaction effects. For shortrange PHEVs, the following interaction terms were considered: (i) PC2.Chg * PC3.Drv; (ii) PC2.Chg * PC2.Drv; and (iii) PC4.Chg * PC1.Drv. For the longer-range PHEVs, since only one term was statistically insignificant (PC3.Drv, Table 1), we included the top two predictors from Table 11, PC3.Chg * PC2.Drv as the interaction term. The parameter estimates and model fits of the four regression models are summarized in Appendix A, Table A3. None of the interaction terms were statistically significant at 5%, and there were no noticeable improvements in the model fit. Due to these two reasons, the relative importance of the interaction terms were not analyzed further.

Discussion
This paper analyzed year-long driving and charging behavior of 153 PHEVs in California and compared it against EPA test cycles and SAE J2841 assumptions. We expanded upon these observations by examining usage pattern differences among the five PHEV models included in this paper. Our findings are summarized below.
Observed PHEVs are driven more aggressively and accomplish a higher share of travel in nonurban driving conditions (45 mph or faster or less than 3 stops per mile) compared to standardized dynamometer test cycles. The percentage share of time and distance traveled at highway speeds (60 mph or faster) is noticeably under-represented or excluded in test cycles. Approximately 80% of VMT in the UDDS cycle is at 45 mph or slower, whereas the overall average in our dataset was only 40%. Short-range PHEVs (Prius, CMax and Fusion) are driven 4% to 7% more at 60 mph or faster compared to longer-range PHEVs (Volts). The above disparities clearly manifested in the form of increased energy consumption in the CD-EV mode, which reduces the effective AER realized on-road. Using PCA, we characterized driving behavior based on 4 factors: vehicle usage intensity, aggressive driving style at highway speeds which increases the energy intensity, preference for long-distance travel, and conservative driving style.
Analysis of charging behavior revealed marked differences between the single, overnight, fully charged assumption of J2841. On average, observed PHEVs (except the Volt-53) charged more than once per day, and the driving distance on days when the PHEV was not charged was more compared to the days on which they charged at least once. Results indicated that short-range PHEVs have a higher share of driving days when they are not charged at all. The possibility of PHEVs to charge away from home, more than once per day, and PHEV being used like a regular HEV are the other notable distinctions between this study and the generalized J2841 assumptions. The differences in charging behavior outlined above are due to charger accessibility by location (home, away, or both), and charger utilization which could be defined based on frequency of charging or duration of charging. These were characterized using four influential factors extracted by the PCA. Regression models and relative importance analysis indicated that for short-range PHEVs (Prius, CMax, and Fusion), higher annual VMT and share of travel at highway speeds contributed the most to the observed UF being lower than label rated estimates, whereas enhanced charging infrastructure at home and while away increases the UF. In the case of the Volts, long-distance travel days (50 miles or more) and share of travel at highway speeds were the primary reason for lowering the observed UF below its label rated estimated, and increasing the frequency of charging at home increases the UF.
Driving related differences could due to a combination of road infrastructure, early adopter preferences, and vehicle technology attributes like age, AER, maximum electric speed, and drivetrain design. California has the third highest rural interstate and the highest urban interstate highway system length [73], which was partly reflected in the relatively bigger share of highway speed driving observed in this study, compared to test cycles that are used in performance evaluation. California also scores low in proximity to major roadways [74] and ranks among the top three states by average VMT in urban and suburban census tract groups [75]. The cumulative effect of these California specific features were clearly revealed annual VMT and share of long-distance travel (50 miles or more) of the PHEVs observed in this study. The sub-sample of drivers in this dataset are PEV early adopters who purchased or leased a new PHEV and are generally more educated, wealthier, and own a home, compared to mainstream ICE user's driving patterns in the NHTS on which the J2841 relies on [76,77]. Rebound effect in the classical sense by which improvements in fuel economy of newer vehicles increase the travel demand [78], could also have played a part in higher vehicle usage intensity of all the PHEV models compared to sticker label annual mileage of 15,000 miles, except the Volt-53, which seemed to have faced a slight backfire effect [79].
Apart from differences among the PHEV models in terms of annual VMT, driving style (aggressive or conservative), and the magnitude of long-distance travel, the distribution of UF( Figure  1) indicates that heterogeneity in charging preferences exists among different as well as within the same PHEV model. The fact that some PHEVs, irrespective of AER, electrified less than 20% of their rated label UF demonstrates that motivations for charging or not charging are far more complex in reality compared to the simplistic notion of one fully charged session per day at home. Our study illustrates that charger accessibility and utilization have varying levels of influence on the UF depending on the AER. In the case of short-range (20 miles or less, Prius, CMax and Fusion), since their AER is less than their average daily VMT (about 46 miles), there was not enough incentive in the form of eVMT gained, to charge more for compensating their higher travel demand. Lower UF of observed PHEVs compared to their rated label estimates could also be due to self-selection bias by PHEV buyers who are less concerned about eVMT because their decision to purchase the PHEV was motivated by other reasons such as rebate, clean air vehicle decals, or preferential parking spaces. It is also evident that there are diminishing marginal returns in UF and eVMT with an increase in the AER, the case in point being the UF of Volt-53, which was similar to that of Volt-35/38, in spite of the Volt-35/38 driving 2000 miles more than Volt-53 annually.
The performance of PHEVs depends on the intertwined relationships between driving behavior, charging behavior, vehicle technology attributes, and user preferences. The dual propulsion energy source (electric motor and conventional ICE) enables the automakers to offer a wide range of design options to potential buyers depending on the degree of emphasis of one driving mode over the other, which is influenced by the policy goals. Fuel economy and energy efficiency of the ICE were prioritized over the all-electric mode operation due to cost and charging infrastructure considerations in the infant stages of the PHEV market. To maximize the GHG reduction potential of PHEVs, policies that encourage longer-range PHEVs which emphasize more on all-electric mode are needed. While this has a direct impact on the policy signals sent to the automakers and subsequently the model offerings available for potential PHEV buyers, it is critical to consider aspects outside the domain of vehicle technology such as charging infrastructure expansion and heterogeneous user preferences. Though this study does not advocate moving away or replacing the J2841 UF as the metric to quantify the environmental impact of PHEVs, there is definitely room for improving the accuracy UF estimates by incorporating additional scenarios that are more representative of real-world driving and charging behavior.
Generalizability and applicability of this paper's insights to the broader PHEV market in general, or even within California, is not feasible due to sample size limitations, which is very common and unavoidable, and intrinsic to real-world observational studies. Moreover, today's PHEV users are early adopters, whose socio-demographic and economic indicators differ from the general population of mainstream ICE users [43]. In essence, results presented in this paper should be interpreted within developmental phases of the PHEV market.

Conclusions and Future Research Directions
This study systematically analyzed the driving and charging patterns of 153 PHEVs operating in California. The purpose of this study was to investigate why the observed performance of PHEVs deviated from their expectations and what the influential factors were that contributed to these World Electric Vehicle Journal 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/wevj disparities. We first compared observed and expected PHEV usage patterns.. We also compared the usage patterns of the five PHEV models (Prius, CMax/Fusion Energi, first and second generation Chevrolet Volts) that were analyzed in this study at time-scales varying from trip-level to annual estimates. We utilized principal components analysis to reduce the dimensionality of the dataset while capturing at least 87% of the variance in dataset using just four driving and four charging related factors. The explanatory power and the statistical significance of the extracted factors were evaluated using multivariate regression models. We quantified the relative contribution of each of the extracted factors toward the difference in observed Utility Factor from label expected values. We also investigated if there are any statistically significance interaction terms which further improve the regression model fit and offer additional insights. Results indicated that higher annual mileage and higher energy intensity were the top two aspects that lowered the observed UF of short-range PHEVs (Prius, CMax/Fusion) when compared to label expectations. Enhanced charging infrastructure access and balanced utilization at home and away increased the observed UF of short-range PHEVs. In the case of longer-range PHEVs (Volts), their propensity toward more long-distance travel (50 or more miles/day), followed by their annual mileage, contributed the most to lowering their UF from label values. Due to their bigger battery capacity, increasing the frequency of shallow charging sessions had a bigger and positive effect on UF, rather than charging for a longer duration time, but less frequently. Regression models indicated that the effect of long-distance travel and deep cycle charging at home were statistically significant for longer-range PHEVs, but not for short-range PHEVs. Analyses also indicated the absence of any statically significant interaction terms. Distribution of UF (Figures 1 and 2) indicated that even within PHEVs with the same AER, there was a diversity in usage patterns. Daily driving distances and style (Figures 2-5), number of charging sessions on days driven (Figure 8), and charging location distance from home ( Figure 10) demonstrated the linkage between travel and charging behavior and AER.
Plug-in electric vehicles (BEVs and PHEVs) are essential to reduce transport sector GHG emission and energy consumption. Plug-in hybrid electric vehicles are considered to be an intermediate and enabling technology option that can catalyze large-scale adoption of PEVs. The environmental benefits of BEVs are unambiguous due to their zero tail pipe emissions, however the same cannot be said of the PHEVs. Operational and fuel-use flexibility helps the PHEVs in overcoming range anxiety related issues associated with BEVs, but the same flexibility complicates the task of evaluating their net environmental impact. To address this, the concept of UF has been developed and widely utilized in the policy domain and techno-economic assessments. There is a growing body of demonstrable evidence suggesting that a mismatch or gap exists between official EPA sticker label and real-world UF, which warrants a deeper examination to improve our understanding of the electrification potential of PHEVs. This paper focused on this research need by scrutinizing real-world PHEV usage patterns and discerning salient facets of driving and charging that deviated from assumptions embodied in sticker label energy consumption and UF estimates.
Superimposing a set of preconceived notions about driving and charging behavior has direct ramifications on how PHEVs are evaluated in command and control policies like the ZEV mandate, regulations governing their on-road performance, and policies that encourage their usage through economic incentives. Developments in battery technologies, diversification of PHEV model offerings, expansion of charging infrastructure, and a favorable policy environment will increase the market share of PHEVs. As the PHEV market evolves and grows, the need for observing PHEV through studies such as the one presented in this paper will become increasingly valuable. Recognizing realworld scenarios that diverged from assumptions will better inform future policies.
The data collection is still ongoing and future work will include newer PHEV models such as the Chrysler Pacifica Hybrid (33-mile AER) and Toyota Prius Prime (25-mile AER). Future research will incorporate spatio-temporal aspects to identify trip level variables that affect decision to charge or not charge, identify missed charging opportunities at public charging locations, and its implications on the net environmental impact (driving and charging) and electrification potential of PHEVs.

Funding: The data collection was funded by California Air Resources Board, Contract #12-319
Conflicts of Interest: The authors declare no conflict of interest.

Appendix A
This section summarizes the results of the power analysis regression models, and the rationale behind excluding interaction terms in the OLS regression models as well as their relative contribution to the overall model effects.  Table A1 presents the results of the A-priori test that determines the number of samples (number of PHEVs) required to give the significance level (α = 5%), power (1-β), number of predictors (eight, the 4 driving and 4 charging related PCs), and effect size. The probability of Type I and Type II error is α and β respectively. From Tables A1 and A2 we can see that there are sufficient number of shortrange and long-range PHEVs in the dataset to detect a "true" effect of a predictor. Table A2 is a posthoc test which calculates the power achieved given the significance level, sample size and effect size. From Table A2, we can see that the achieved power of the OLS regression models is more than sufficient. Table A3 presents the regression model estimates and the model fit summaries with interaction effects. When we compare the main effects only model results (Tables 9 and 10) and the model results with main and interaction effects in Table A3, the estimates changed slightly. However, their statistical significance almost remained identical to the main effects only model, even after the inclusion of interaction effects across all the four models in Table A3, the only change was observed when interacting PC2.Chg * PC2.Drv (Home-less frequent and deep cycle * Long-distance travel) of shortrange PHEVs, where the predictor PC2.Chg is statistically significant, whereas in the main effects only model in Table 9, it was not statistically significant at 5%. Since the interaction terms were not statistically significant, no additional analyses were performed to assess their relative importance and only the main effects model were considered in our analysis.