Understanding the Correlation of Demographic Features with BEV Uptake at the Local Level in the United States

: Battery Electric Vehicles (BEVs) have seen a substantial growth in the recent past, and this trend is expected to continue. This growth has been far from uniform geographically, with large differences in BEV uptake between countries, states, and cities. This non-uniform growth can be attributed to the demographic and non-demographic factors that characterize a geographical location. In this paper, the demographic factors that affect BEV uptake at the Zone Improvement Plan (ZIP) code level are studied extensively across several states in the United States to understand BEV readiness at its most granular form. Demographic statistics at the ZIP code level more accurately describe the local population than national-, state-, or city-level demographics. This study compiled and preprocessed 242 demographic features to study the impact on BEV uptake in 7155 ZIP codes in 11 states. These demographic features are categorized based on the type of information they convey. The initial demographic features are subjected to feature engineering using various formed hypotheses to extract the optimal level of information. The hypotheses are tested and a total of 82 statistically signiﬁcant features are selected. This study used correlation analysis to validate the feature engineering and understand the degree of correlation of these features to BEV uptake, both within individual states and at the national level. Results from this study indicate that higher BEV adoption in a state results in a stronger correlation between demographic factors and BEV uptake. Features related to the number of individuals in a ZIP code with an annual income greater than USD 75 thousand are strongly correlated with BEV uptake, followed by the number of owner-occupied housing units, individuals driving alone, and working from home. Features containing compounded information from distinct categories are often better correlated than features containing information from a single category. In-depth knowledge of local BEV uptake is important for applications related to the accommodation of BEVs, and understanding what causes differences in local uptake can allow for both the prediction of future growth and the stimulation of it. Hypothesis testing is done for all formed hypotheses. Correlation analysis is performed next on the demographic features with the BEV uptake.


Introduction
Battery Electric Vehicles (BEVs) have gained popularity in the last few years, with the advancement of battery technology, increase in charging infrastructures, and rising concerns over greenhouse emissions from conventional vehicles. Although there has been a considerable growth in BEV uptake, the slope of this growth is not consistent across countries, or even within individual US states. There are several factors that affect BEV uptake, including the availability of charging infrastructures, public incentives, sociodemographic factors, psychological factors, and environmental awareness.
Charging infrastructure planning requires the proper understanding of the BEV uptake at the local level for the optimal selection of the sites. Incentives have been partially effective in the promotion of the BEV technology. Many states have incentives in place to help offset the high initial cost of BEVs. The federal government tax credit is up to USD 7500 [1] binomial and ordinary least-squared regression approaches. The dependent variables are log-transformed, and their collinearity is not explained in this study. Education and income are determined to have the most positive impact on EV uptake. In reference [23], the authors developed an ordinary least-squared regression model to study the demographic factors to understand EV and photovoltaic uptake. Additionally studied are 1670 ZIP codes, where median income is determined to have a positive influence on EV uptake, and larger households are determined to influence EV uptake negatively. In reference [24], the authors developed a multiple logistic regression model to assess EV penetration rate with demographic factors. For this study, 58 California counties are used for training the model and nine Delaware Valley counties are used to validate. The authors identified income, education, and the car-sharing status of the household to be the most important in influencing EV adoption.
Understanding the demographic factors at a granular level is important, and the findings can be aggregated to characterize a larger area, such as a city or state. State-level studies [25] provide a coarse presentation about the importance of the factors affecting EV adoption but cannot address the regional variability of the same factors. In reference [26], the authors studied the demographic factors with BEV adoption at an even more granular level than a ZIP code, with 80-120 dwellings in one instance. This level of detail is difficult to replicate at a larger scale due to both limited data resources and privacy and security concerns. In reference [27], the authors studied the capabilities of off-street parking with key demographic information to study which areas have the greater probability to transition to BEVs.
BEV uptake is ultimately affected by a combination of many disparate factors, both demographic and non-demographic. In reference [28], the authors have categorized the nondemographic factors technical, contextual, cost-related, behavioral, and social determinants. They are studied along with socio-demographic factors and BEV-specific experiences to study BEV acceptance. In reference [29], socio-psychological factors are studied to understand the BEV uptake in two cities in China. Socio-psychological factors include technical knowledge about BEVs and policies in effect, neighbor effects, and environmental awareness. In reference [30], a total cost of ownership (TCO) model is developed, where fuel prices are forecasted with the battery pricing of BEVs to study BEV adoption. The price of a BEV is shown to have a greater impact in BEV adoption. Electricity tariffs are also studied in the TCO model, which is dependent on vehicle usage. In references [31][32][33], the authors have developed a charging station network with the simultaneous objectives of reducing range anxiety, minimizing deployment cost, and maintaining the quality of service.
In this paper, a comprehensive study is done to understand the demographic features at the ZIP code level and its correlation with the BEV uptake across 11 states, collectively and individually. While this study isolates quantifiable demographic factors, the comparison of results across states or ZIP codes with disparate non-demographic factors, such as EV policies or fuel prices, can yield information about the net impact of non-demographic factors as well. This study addresses a knowledge gap in quantitative comprehensive studies about demographic factors with BEV uptake at a granular level at a larger scale. BEV uptake has fallen short of expectations and been far from uniform geographically, with large differences in BEV uptake between countries, states, and cities. The geographical differences may be partially explained by the socio-demographic factors that characterize the region. Therefore, there is a need for an extensive study of the socio-demographic factors to understand their effects on BEV uptake within a state and across different states. Performing this analysis at the ZIP code level is particularly important, as characteristics at the state or national level do not accurately reflect the characteristics in a smaller geographical region. The preliminary results of this study have been published in reference [34]. Figure 1 shows the workflow process for the proposed framework in this study. Once collected, the ZIP-code-level data are compiled and preprocessed. The demographic features are grouped based on specific categories and interactions. These groups are then subjected to feature engineering based on formed hypotheses, and thresholds are set. a smaller geographical region. The preliminary results of this study have been pu in reference [34]. Figure 1 shows the workflow process for the proposed framework in this stud collected, the ZIP-code-level data are compiled and preprocessed. The demo features are grouped based on specific categories and interactions. These groups subjected to feature engineering based on formed hypotheses, and thresholds Hypothesis testing is done for all formed hypotheses. Correlation analysis is pe next on the demographic features with the BEV uptake. The novel contributions of this paper are as follows: 1. Extensive study of 242 socio-demographic factors; 2. Examining 7155 ZIP codes across 11 states; 3. Developing a research framework to transform the granular demographic d features more relevant to BEV uptake; 4. Quantifying the relationship between the demographic features and BEV u different geographic locations.
Only BEV uptake is studied as opposed to plug-in hybrids because BEVs put greater challenges of range anxiety [35] and travel time, which can be consequ understanding the demographic factors. The results from this study are relevant to applications, including policymaking, charging infrastructure planning, and c demand analysis, for example. In addition, the framework of this study can help understand the uptake of other vehicular technology, such as autonomous vehicl The paper is organized as follows: Section 2 discusses the description demographic features and BEV uptake, Section 3 discusses the engineering of the and the correlation techniques used, and Section 4 discusses the results an significance, followed by conclusions and future research opportunities.

Demographic Feature Analysis and BEV Uptake
BEV uptake is sparsely distributed across ZIP codes in all the 11 states Uptake is highly skewed between states as well, with California leading the understand the underlying factors responsible for this inconsistent distributio among the states and the ZIP codes within them, the socio-economic factors charac each region are quantified and analyzed.
Demographic features are used to characterize individual ZIP codes. These are collected from an open resource of census data for the year 2019 [37]. From features in this dataset, all features falling under broad categories with a pote affecting BEV uptake are initially considered. BEV uptake in a ZIP code is th The novel contributions of this paper are as follows: 1.
Developing a research framework to transform the granular demographic data into features more relevant to BEV uptake; 4.
Quantifying the relationship between the demographic features and BEV uptake at different geographic locations.
Only BEV uptake is studied as opposed to plug-in hybrids because BEVs put forward greater challenges of range anxiety [35] and travel time, which can be consequential in understanding the demographic factors. The results from this study are relevant to several applications, including policymaking, charging infrastructure planning, and charging demand analysis, for example. In addition, the framework of this study can help to better understand the uptake of other vehicular technology, such as autonomous vehicles [36].
The paper is organized as follows: Section 2 discusses the description of the demographic features and BEV uptake, Section 3 discusses the engineering of the features and the correlation techniques used, and Section 4 discusses the results and their significance, followed by conclusions and future research opportunities.

Demographic Feature Analysis and BEV Uptake
BEV uptake is sparsely distributed across ZIP codes in all the 11 states studied. Uptake is highly skewed between states as well, with California leading the US. To understand the underlying factors responsible for this inconsistent distribution, both among the states and the ZIP codes within them, the socio-economic factors characterizing each region are quantified and analyzed.
Demographic features are used to characterize individual ZIP codes. These features are collected from an open resource of census data for the year 2019 [37]. From all the features in this dataset, all features falling under broad categories with a potential for affecting BEV uptake are initially considered. BEV uptake in a ZIP code is the target variable in this analysis. This is quantified as the number of BEVs registered with home addresses in a particular ZIP code at a specific time. BEV registration is selected over sales metrics, as it better captures the number of vehicles on the road and more accurately reflects the area of residence for each driver. In this study, records of BEV registrations for 11 states at the  [38,39]. The time frame for active BEV registrations is selected from February 2019 to February 2020. The data timeframe is chosen in part to exclude the impact of COVID-19, the effects of which can be examined by following a similar framework in future longitudinal studies.
The above data is collected for each ZIP code across 11 states. A total of 242 demographic characteristics potentially relevant to BEV uptake are initially selected for this study. Figure 2 shows a heatmap of BEV uptake in the 7155 ZIP codes considered in the 11 states in the US. ZIP codes are colored as a gradient from green (0 BEV) to red (5 or more BEVs).
variable in this analysis. This is quantified as the number of BEVs registered with hom addresses in a particular ZIP code at a specific time. BEV registration is selected over sale metrics, as it better captures the number of vehicles on the road and more accuratel reflects the area of residence for each driver. In this study, records of BEV registrations fo 11 states at the ZIP code level are used from available open resources [38,39]. The tim frame for active BEV registrations is selected from February 2019 to February 2020. Th data timeframe is chosen in part to exclude the impact of COVID-19, the effects of which can be examined by following a similar framework in future longitudinal studies.
The above data is collected for each ZIP code across 11 states. A total of 24 demographic characteristics potentially relevant to BEV uptake are initially selected fo this study. Figure 2 shows a heatmap of BEV uptake in the 7155 ZIP codes considered in the 11 states in the US. ZIP codes are colored as a gradient from green (0 BEV) to red (5 o more BEVs).  Table 1 shows a summary table of the states considered with the number of ZIP code in each state and the total BEVs. Additionally shown is the coarse distribution of BEV within each state across ZIP codes. It is observed that the highest percentage of ZIP code with "0" BEV registrations is Wisconsin, "1-99" BEV registrations is New Jersey, and ">100" BEV registrations is California.   Table 1 shows a summary table of the states considered with the number of ZIP codes in each state and the total BEVs. Additionally shown is the coarse distribution of BEVs within each state across ZIP codes. It is observed that the highest percentage of ZIP codes with "0" BEV registrations is Wisconsin, "1-99" BEV registrations is New Jersey, and ">100" BEV registrations is California. Once the data is collected, it is preprocessed to maintain data consistency. The demographic data is processed based on the following considerations:

•
Population of a ZIP code, if zero, it is removed; • Any ZIP codes with "#N/A" or "-" values are removed. However, before eliminating the ZIP code, it is investigated if the discrepant values can be retrieved from other information in that ZIP code. As an example, if owner-occupied housing unit has "#N/A" value, it can be retrieved by subtracting rented-occupied housing units from total occupied housing units, if that information is available; • When features are reported as a percentage of the total population in the ZIP code, they are converted to an absolute number; • Median income in the ZIP codes is reported in a few cases as "25,000−" or "250,000+". In both cases, the boundary values are the actual value, i.e., 25,000 and 250,000.
Understanding the demographic features: From all the demographic features available, 242 features that are hypothesized to impact BEV uptake are selected to characterize a location (ZIP code) for the selected state. These demographic features are organized into three classes based on whether the data is reported as individuals, housing units, or US dollars: The demographic features are further classified into six broad categories based on the type of information they convey about the ZIP code: 1.
Category 1-Population: Number of residents in the ZIP code. Typically helps us to understand BEV penetration with respect to the population of that place; 2.
Category 2-Vehicle Information: Number of vehicles owned by individuals or households; 3.
Category 3-Traveling Characteristics: Characterizes the traveling nature of the residents of the place, including means of transportation and average daily commute time; 4.
Category 4-Migration of the Residents: Growth of the ZIP code in terms of residents moving out of the area or coming in; 5.
Category 5-Economy: Financial information of the ZIP code; 6.
Category 6-Living Arrangements: Owner-occupied and multi-dwelling units help to understand the type of housing units in which the residents reside.
Category 1-4 are expressed in terms of the number of individuals; Category 5 includes information on both number of individuals and income (USD), and Category 6 includes information on the number of housing units. One of the primary objectives of this study is to understand how each category of features affects BEV uptake and how some of the categories interact with each other to affect BEV uptake. For many of the demographic features in this study, the six categories overlap each other. The demographic features are then grouped based on their categories for further analysis. A total of 15 groups are formalized for this study. Table 2 shows the 242 demographic features studied and the group to which they belong. The table shows the categories of data within each group, the type of information, and the number of features fitting this description. An example from each group is provided for clarity.
The interaction of the categories is important to study along with the individual categories to better understand the complex factors contributing to BEV uptake. Many features provide information about the number of individuals or households that meet multiple simultaneous criteria, the intersection of which may affect BEV uptake more than either factor individually. In addition, many of the features contain excessively granular brackets of data that may not individually correlate well with BEV uptake. However, new features can be engineered from this information that better explain the BEV uptake in the ZIP code.

Feature Engineering and Selection
Feature engineering is commonly used by machine learning researchers to transform raw data to better understand the underlying problem at hand. Here, feature engineering helps to structure the raw data in such a manner to yield more meaningful information and provide a better understanding of BEV uptake.
While not all the original features may be well correlated with BEV uptake, useful information may still be extracted from the features. Formed hypotheses are used to engineer new features from available information. Features can then be selected to study their correlation with BEV uptake at the ZIP code level. In this paper, the initial 242 demographic features are hereafter referred to as Detailed Features. The Detailed Features are subjected to feature engineering and a final set of 82 features are then selected for the study that yields meaningful correlation results. This list of 82 features is hereafter referred as the Reduced Features. Detailed Features and Reduced Features follow the same structural framework, where the groups and categories remain the same and only the number of features in each group differ.
To determine the Reduced Features list, hypotheses are formulated against given thresholds to engineer features, which are tested using t-tests. To perform the t-tests, the demographic features are normalized in terms of BEV uptake. If the formed hypotheses hold true, the threshold is selected, and the number of features can be reduced based on the threshold. Otherwise, the threshold can be dismissed as it yields no statistically meaningful results. Figure 3 shows the flowchart of the process of generating the Reduced Features from the Detailed Features through hypothesis testing and threshold selection. To determine the Reduced Features list, hypotheses are formulated against given thresholds to engineer features, which are tested using t-tests. To perform the t-tests, th demographic features are normalized in terms of BEV uptake. If the formed hypothese hold true, the threshold is selected, and the number of features can be reduced based on the threshold. Otherwise, the threshold can be dismissed as it yields no statistically meaningful results. Figure 3 shows the flowchart of the process of generating the Reduced Features from the Detailed Features through hypothesis testing and threshold selection. Each category is examined to form the hypotheses at certain thresholds. The formed hypotheses and their respective thresholds are as follows: In the Detailed Features, the data for the average daily commute time are provided at 5-10 min intervals. The number of people within each small bracket of commute tim may not correlate well with BEV uptake at the ZIP code level. Instead, it is hypothesized that individuals with a commute shorter than some threshold times may have a differen likelihood of driving a BEV than individuals with a longer commute, potentially owing t range anxiety [4]. Accounting for the data available for each ZIP code, the price and rang of the popular BEV models sold, and considering driving behavior and weathe constraints [6], a threshold time of 60 min is hypothesized and tested.

Hypothesis 2. Commuting characteristics.
For the Detailed Features, commuting characteristics are reported as the number o individuals with a given means of transportation to work. The available data specifie how many individuals commute to work using a car and driving alone, in a 2/3/4 or mor person carpool, or by public transportation, bicycle, or other means. In the granulated Each category is examined to form the hypotheses at certain thresholds. The formed hypotheses and their respective thresholds are as follows: In the Detailed Features, the data for the average daily commute time are provided at 5-10 min intervals. The number of people within each small bracket of commute time may not correlate well with BEV uptake at the ZIP code level. Instead, it is hypothesized that individuals with a commute shorter than some threshold times may have a different likelihood of driving a BEV than individuals with a longer commute, potentially owing to range anxiety [4]. Accounting for the data available for each ZIP code, the price and range of the popular BEV models sold, and considering driving behavior and weather constraints [6], a threshold time of 60 min is hypothesized and tested.

Hypothesis 2. Commuting characteristics.
For the Detailed Features, commuting characteristics are reported as the number of individuals with a given means of transportation to work. The available data specifies how many individuals commute to work using a car and driving alone, in a 2/3/4 or more person carpool, or by public transportation, bicycle, or other means. In the granulated form, the correlation significance for a single feature in this context can be minimal when studying BEV uptake. The hypothesis is made that the number of people in a carpool, or the type of alternate transportation, are not relevant to BEV uptake. The engineered features are thus grouped based on the number of people driving alone, carpooling, or using other means of transportation.

Hypothesis 3. Number of vehicles.
In the Detailed Features group, the number of vehicles is reported for a household as well as for an individual in terms of 0/1/2/3/4/5, or more. Correlation between each exact number of vehicles present can be of low significance to the BEV uptake. For a better understanding of the number of vehicles present and its relationship with BEV uptake, a hypothesis is made that an individual or household's likelihood to buy a BEV may depend on whether they own zero, one, or more than one vehicle.

Hypothesis 4. Types of housing-unit structures.
For the Detailed Features, types of housing-unit structures are reported in increments of occupants, such as 1/2/(3-4)/(5-9), and up to 50 or more. In a broader sense, types of housing-unit structures help us to understand whether an individual lives in a singledwelling unit or a multi-dwelling unit. This level of granularity in the case of the multidwelling units is hypothesized to not be significant with respect to BEV uptake. For the engineered features, this data is simplified to the number of individuals living in single-dwelling units and the number of individuals living in multi-dwelling units.

Hypothesis 5. Income level.
For the Detailed Features, the income of an individual is reported in USD from no income to USD 150 k and above in non-uniform intervals. It is intuitive that the higher the income of an individual or household, the more likely they are to purchase a BEV. However, small income brackets will not individually correlate well with any target variable. It is hypothesized that the likelihood to buy an EV is affected by whether an individual has some threshold of disposable income. Comparing the price and ranges of popular sold models in the US with a federal discount and maintenance annually [40], it is observed that a BEV costs approximately USD 23 k on average. From national data and surveys, it is recommended that the price of an individual's car should be 30% of their annual income [41]. With this information, and the available income brackets in the ZIP code data, income features in the Reduced Features are calculated based on the numbers of individuals making more or less than USD 75 k annually.
The formed hypotheses discussed need to be statistically tested to establish the validity of the thresholds set. The t-test is used for testing all the formed hypotheses. The steps to engineer and select the 82 features from the 242 features are shown in Algorithm 1.

Correlation Study
The Reduced Features represent the most statistically significant version of the information available in the original dataset but must be further studied to determine if there is a meaningful correlation with BEV uptake in each of the zip codes. Correlation studies are performed on individual states and for the 11 states, collectively. Spearman's coefficient is used for the correlation study, as the data is non-Gaussian. To test Gaussian distribution, D'Agostino's test [42] is used. To illustrate the non-Gaussian nature of the demographic features, the histogram plot for an example feature is shown in Figure 4, showing the population that owns more than one vehicle [43]. The skewness-kurtosis test has a value of 2238 and the p-value is less than α = 0.05. This suggests that the data distribution is not normal. While only one example is shown, all demographic features in the Detailed and Reduced Features sets are tested and exhibit similar non-Gaussian distributions.  Determine t calculated .

Results and Discussion
In this section, partial results from the study are shown, including feature engineering, hypothesis testing, feature selection, and correlation analysis. The correlation study not only helps to understand how features are correlated with BEV uptake, but the study also helps to validate the engineering of the features. In this paper, due to space limitations, not all the results are shown. To demonstrate the conducted work, several examples are given in detail, followed by a summary of the correlation results across both states and feature categories. Spearman's correlation coefficients can range from "−1" to "1", with "−1" indicating perfect negative correlation, "0" indicating no correlation, and "1" indicating perfect positive correlation between a demographic feature and BEV uptake. For subsequent qualitative comparisons of the strength of each Spearman's coefficient, it is posited that "weak correlation" corresponds to coefficients below 0.6, "fair correlation" to coefficients between 0.6 and 0.8, and "strong correlation" to coefficients above 0.8 [44].

Results and Discussion
In this section, partial results from the study are shown, including feature engineering, hypothesis testing, feature selection, and correlation analysis. The correlation study not only helps to understand how features are correlated with BEV uptake, but the study also helps to validate the engineering of the features. In this paper, due to space limitations, not all the results are shown. To demonstrate the conducted work, several examples are given in detail, followed by a summary of the correlation results across both states and feature categories.
Features containing information about the income of the population in a ZIP code (Group 5) are used to demonstrate the proposed research framework. Hypotheses formed for this group are discussed and results of hypothesis testing are shown. The correlation results for this group are shown for both Detailed Features and Reduced Features, and for individual states, and all the states collectively. Next, features containing information about the means of transportation of the population in the ZIP code (part of Group 3) are discussed. To illustrate the feature interaction and its importance, Group 8 is discussed next, which contains information about the combination of income and means of transportation. From the complete set of 82 Reduced Features, the 10 best correlated features are presented for all of 11 states, individually and collectively. Finally, the best feature from each group is shown to illustrate the extent of correlation of these groups with BEV uptake.
This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.  Table 3. The formulated hypothesis for income level is that the relevant threshold is USD 75,000. Based on the set threshold, data is calculated based on the population having an income less than USD 75,000 and the population having an income greater than USD 75,000 in a ZIP code. For hypothesis testing, first, the values are transformed into average population per BEV. Null and alternate hypotheses are formed.
Null Hypothesis: The average population per BEV with an income greater than USD 75,000 is the same as the average population per BEV with an income less than USD 75,000.
Alternate Hypothesis: The average population per BEV with an income greater than USD 75,000 is different than the average population per BEV with an income less than USD 75,000.
For conducting hypothesis testing, the t-test is used. The t critical is determined to be 1.96 for degrees of freedom greater than 100 [45]. The t is calculated for all the 11 states and the states individually. Results are given in Table 4. As t calculated is greater than t critical , the null hypothesis is rejected. This means that the average population per BEV is different depending on income brackets. Once it is established that the alternate hypothesis is true, the correlation of the income features to BEV uptake is investigated, and the results are shown in Table 5. The table includes Detailed Features and Reduced Features for all of the 11 states and for each state individually. The Spearman correlation coefficients are studied to analyze whether and by what degree the engineered features present meaningful results.
From Table 6, it is observed that the Detailed Features exhibit varying degrees of correlation individually. It is also observed that the population who are working from home has a strong correlation with BEV uptake. In the Reduced Features for this group, it is seen that the population who drives a car alone, carpooled, or used other means, each has a moderate correlation with BEV uptake, with small differences in the degree of correlation.
From Table 5, it is evident that income brackets USD 1 to USD 9999 and USD 35,000 to USD 49,999 have a similar degree of correlation, whereas the income brackets between them are different. However, the population with an income greater than USD 75,000 has a greater correlation with BEV uptake than the population with an income less than USD 75,000. This holds true for all the 11 states collectively and individually. In Colorado, Oregon, and Washington, though there is a degree of difference in the correlation coefficients, both brackets are strongly correlated with the BEV uptake.

Characterization of ZIP Codes in Terms of Means of Transportation
Group 3 contains the information about the means of transportation of the individuals in the ZIP code. Table 6 shows the Detailed Features and Reduced Features for Group 3. For this group, the commuting characteristic hypothesis is implemented to engineer features. Reduced Features are based on the population who drives a car alone, carpools, or uses other means to travel to work. Other means of travel include public transportation, bicycles, walking, using taxicabs, or working from home.

Characterization of ZIP Codes in Terms of Income and Means of Transportation
To study how the features interact and if the interaction of these features has an impact on their correlation with BEV uptake, Group 8 is studied. Group 8 contains information about the intersection of features in Groups 3 and 5. Specifically, these features report the number of individuals with both a given commuting behavior and income level. The thresholds established for income and commuting are used to engineer the features for this group. An example of such an engineered feature is the population in a ZIP code who drives a car alone and has an income greater than USD 75,000. Table 7 shows the correlation results for Group 8 with BEV uptake for all the states and each state individually.
From Table 7, it is seen that for all the states collectively, the number of individuals having an income greater than USD 75,000 and driving alone or traveling using other means has a greater degree of correlation than the rest of the features. Most of the states exhibit the same trend, with New Jersey displaying the starkest example. In Colorado, Minnesota, and Wisconsin, the correlation of all six features is more similar.
Features describing the number of individuals that meet multiple criteria often exhibit a stronger correlation with BEV uptake than the single criteria feature they correspond to, but this is not always the case. Figure 5 is a bar graph showing a sample of such interactions across different features and states. It is observed that features containing composite criteria can correlate very differently from features containing only part of the same information. In California, among the population driving a car alone, or having income greater than USD 75,000, each has a moderate to strong correlation with BEV uptake. However, the number of individuals meeting both criteria has a very strong correlation. Conversely, in Vermont, the population who carpools and the population with an income greater than USD 75,000 each have moderate correlation with BEV uptake, but the feature describing the intersection of these criteria is weakly correlated. Lastly, in Michigan, there is not a large difference in correlation between the population traveling by other means, the population with an income greater than USD 75,000, and the population meeting both criteria. correlation. Conversely, in Vermont, the population who carpools and the population with an income greater than USD 75,000 each have moderate correlation with BEV uptake, but the feature describing the intersection of these criteria is weakly correlated. Lastly, in Michigan, there is not a large difference in correlation between the population traveling by other means, the population with an income greater than USD 75,000, and the population meeting both criteria.

The 10 Best Correlated Features with BEV Uptake
An extensive study is performed with the Reduced Features, consisting of 82 demographic features in total, across all 15 groups with BEV uptake, for each of the 11 states individually and collectively. The 10 best correlated features are shown in Table 8, based on the results from the collective data of 11 states. The green boxes indicate strong correlation, the red indicate moderate correlation, and the white indicate weak correlation, as defined in Section 3.

The 10 Best Correlated Features with BEV Uptake
An extensive study is performed with the Reduced Features, consisting of 82 demographic features in total, across all 15 groups with BEV uptake, for each of the 11 states individually and collectively. The 10 best correlated features are shown in Table 8, based on the results from the collective data of 11 states. The green boxes indicate strong correlation, the red indicate moderate correlation, and the white indicate weak correlation, as defined in Section 3.
The 10 best demographic features include three features from Group 13 (information on living arrangements and economy), two features from Group 8 (income and means of transportation), and one feature each from Groups 3 (means of transportation), 5 (economy), 7 (means of transportation and vehicles), 10 (living arrangement and means of transportation), and 11 (migration and economy). A total of 7 out of 10 features have information that relates to the population with an income greater than USD 75,000. It is noted that despite correlating well across the 7155 ZIP codes, two of the features have a weak correlation within individual states. In most cases, however, the top 10 features are similar for the unified model and in each state individually, though the degree of correlation with the BEV uptake varies. Table 9 shows the summary of the top 10 features and which states have those features in their top 10 list as well. Additionally, the average ranking of the features is shown to demonstrate the relative variability of these features among the states.

The Best Correlated Feature of Each Group
The best feature from each group is shown in Table 10. Analyzing the best feature from each group provides a summary of how well this type of information correlates with BEV uptake in a ZIP code. The "best" feature is determined based on the unified model of all states, but its performance is shown for each state individually as well. It is noted that for almost all the states, the selected feature for each group is also the best in the individual state model; however, the value of their correlation can differ significantly. Figure 6 shows a final qualitative summary of the correlation study for the Reduced Features set. For the unified model of 11 states, there are five features with a strong correlation with BEV uptake. At the individual state level, Vermont and Wisconsin do not have any features with strong correlation. In contrast, half of the studied features are strongly correlated with BEV uptake in Colorado, Washington, and Oregon. A total of 22 of the 82 Reduced Features in the unified model are weakly correlated with BEV uptake. In the state models, most features in New Jersey, Vermont, and Wisconsin are weakly correlated. Finally, in most of the states and in the unified model, more than half of the selected features are moderately correlated with BEV uptake in a ZIP code.

Discussion of Results
Many of the demographic factors studied exhibit a moderate to strong correlation with BEV uptake. While most demographic features are themselves correlated with the population in a ZIP code, features which quantify more specific subsets of the population often correlate more strongly with BEV uptake. Many of the above results confirm common intuitions, such as the fact that seven of the top 10 features quantify subsets of the population with an income greater than USD 75 k. It is important in any correlation analysis, however, to note that these relationships cannot be assumed to be causal-the population fitting the description of well-correlated demographic features is not necessarily the only population purchasing BEVs. The demographic factors are simply aggregate descriptors of the ZIP code as a whole.

Discussion of Results
Many of the demographic factors studied exhibit a moderate to strong correlation with BEV uptake. While most demographic features are themselves correlated with the population in a ZIP code, features which quantify more specific subsets of the population often correlate more strongly with BEV uptake. Many of the above results confirm common intuitions, such as the fact that seven of the top 10 features quantify subsets of the population with an income greater than USD 75 k. It is important in any correlation  Importantly, the strength of correlation for the various demographic factors is not static across states. In New Jersey, the majority of demographic features are weakly correlated with BEV uptake, whereas in Oregon the majority of demographic features are strongly correlated. It is observed that when the average number of BEVs per ZIP code in a state is higher, there is a stronger correlation between demographic factors and BEV uptake, evidenced in Colorado, Oregon, and Washington. California, with the highest average number of BEVs per ZIP code, performs very similarly to the full 11-state model. It is also inferred that lower BEV adoption in a state can cause demographic factors to be more weakly correlated with BEV uptake. Vermont and Wisconsin do not have any strongly correlated demographic factors, and their overall BEV adoption is very low.
In addition, certain important non-demographic factors, including charging infrastructure, EV incentives, and fuel and electricity prices can vary significantly between states.
These factors not only affect BEV uptake directly but can change the relationship between demographic factors and BEV uptake. The relative consistency in performance in the best correlated features, however-as well as the large number of moderately to strongly correlated features in the 11-state model-show that quantitative knowledge on the financial characteristics, commuting features, living arrangements, and migration of residents in a ZIP code can explain much of the variance in BEV adoption.

Conclusions
In this paper, an extensive study is conducted to study the correlation of demographic factors with BEV uptake at a granular level at a larger scale. A total of 242 demographic features are collected at the ZIP code level and preprocessed to maintain data consistency. The features are categorized based on the type of information they provide, including population, vehicle information, traveling characteristics, economy, housing, and migration information. New features are then engineered, forming certain hypotheses to set the thresholds, which are validated using t-tests.
Of the demographic features studied, it is determined that the number of individuals in a ZIP code having income greater than USD 75,000 has the strongest correlation with BEV uptake overall. Of other features of interest, the number of owned housing units has a greater correlation with BEV uptake than rented housing units. With respect to means of transportation, both the number of individuals who drive to work alone or by other means of transportation are well correlated with BEV uptake. The number of individuals who are working from home appears to be contributing most of the correlation for other means of transportation.
These factors, as well as others listed in Tables 8-10, represent the demographic descriptors of a ZIP code, which are most correlated with local BEV uptake, rather than descriptors of individual BEV purchasers. Understanding such aggregate factors at the local level is thus important for the effective prediction and accommodation of accelerating BEV uptake across socio-demographically disparate areas.
For future work, a regression model will be developed to analyze these demographic features and better understand BEV uptake at the ZIP code level. While correlation analysis helps to assess each feature's univariate relationship to the dependent variable, regression can address the co-dependency and multicollinearity of the independent variables and their effect on BEV uptake in a ZIP code.