Household Energy Expenditures in North Carolina: A Geographically Weighted Regression Approach

The U.S. household (HH) energy consumption is responsible for approximately 20% of annual global GHG emissions. Identifying the key factors influencing HH energy consumption is a major goal of policy makers to achieve energy sustainability. Although various explanatory factors have been examined, empirical evidence is inconclusive. Most studies are either aspatial in nature or neglect the spatial non-stationarity in data. Our study examines spatial variation of the key factors associated with HH energy expenditures at census tract level by utilizing geographically weighted regression (GWR) for the 14 metropolitan statistical areas (MSAs) in North Carolina (NC). A range of explanatory variables including socioeconomic and demographic characteristics of households, local urban form, housing characteristics, and temperature are analyzed. While GWR model for HH transportation expenditures has a better performance compared to the utility model, the results indicate that the GWR model for both utility and transportation has a slightly better prediction power compared to the traditional ordinary least square (OLS) model. HH median income, median age of householders, urban compactness, and distance from the primary city center explain spatial variability of HH transportation expenditures in the study area. HH median income, median age of householders, and percent of one-unit detached housing are identified as the main influencing factors on HH utility expenditures in the GWR model. This analysis also provides the spatial variability of the relationship between HH energy expenditures and the associated factors suggesting the need for location-specific evaluation and suitable guidelines to reduce the energy consumption.

The influence of various factors such as urban form, socioeconomic, demographic, climate, and housing on energy consumption have been widely studied [11][12][13][14][15][16], yet the empirical evidence is inconclusive [9,17]. The disagreements may exist due to the limitation of datasets, different methodologies, variations in countries, the scale being used, and lack of locational contexts analysis being conducted [18,19]. Most of past energy consumption studies are aspatial in nature neglecting the locational information associated with study areas in the analysis. The importance of understanding the spatial dimension of energy consumption is undeniable [20,21] since local contexts and practices are inherently linked to the energy consumption and sustainability. More differentiated local knowledge (e.g., land-use regulations, policies, and practices) that plays particular role in energy consumption is essential for community-scale urban design [22]. Research addressing spatial aspects thus far has been conducted mostly at the macro scales such as country, region, state or city [1,23,24]. Microscale (e.g., census tract) energy consumption analysis remains under explored [17]. Analytical results drawn from geographically aggregated data are sensitive to the scaler issues such as size of zones (metropolitan statistical area (MSA) versus census tract), which affect the validity and reliability of findings [25]. To minimize this issue, individual level data is recommended. In the absence of such data, however, the smallest geographically aggregated available data is highly preferable [26] especially for policy makers [27].
Conventional ordinary least square (OLS) method utilized in prior studies is incapable of capturing spatial variation of energy consumption in different areas of cities and the location-specific impacts of the driving factors [28]. Tobler's first law of geography "everything is related to everything else, but near things are more related than distant things" [29] (p. 236) indicates the possible spatial non-stationarity in the geographical data [30]. Therefore, energy consumption in different communities of cities should not be considered as being independent of each other due to spatial autocorrelation [28]. Additionally, the impact of diving factors on energy consumption may vary spatially [28]. Geographically weighted regression (GWR) has been proved to be highly effective in capturing such spatial dimensions, however its potential application in energy consumption has not been utilized [31,32].
This study, thus, seeks to add new understanding to the growing body of literature in identifying the key driving factors for HH energy consumption utilizing geographically weighted regression (GWR) method at the census tracts of 14 metropolitan statistical areas (MSAs) in North Carolina (NC). Our study is different from past studies in at least three ways: First, we use census tract data, the smallest aggregated level data, to examine the relationship between HH energy consumption and various explanatory factors including socioeconomics and demographics, housing characteristics, urban form, and temperature. The utility (electricity and gas) and transportation expenditures are considered as measures of HH energy consumption for our analysis. To our knowledge, no prior study has examined these interactions at finer scale geographic data. Second, we applied geographically weighted regression (GWR) method to capture the spatial variation in HH energy expenditures as well as to identify the spatial factors in energy consumption of different areas of cities. Third, the study area, NC utilized in this analysis has never been examined for energy expenditure context, but warrants attention [33]. The results of this study will provide new understanding about the role of human and spatial context in energy use for planners and policy makers at community level. The paper is organized as follows. Section 2 presents the influencing factors on energy consumption of households and theoretical framework. The modeling methodology is described in Section 3. The results are given in Section 4 followed by a discussion in Section 5. We present conclusions and suggestions for future research in Section 6.

Theoretical Framework
Household (HH) energy consumption has been related to a number of factors including HH characteristics, housing characteristics, urban form, and climate ( Figure 1). These factors are reviewed in this section.

Socioeconomic Characteristics and Demographics
Occupant characteristics have been identified as one of the main factors affecting domestic energy use [11]. Socioeconomic and demographic characteristics of HH such as income, education, size, and age determine households' social class and their lifestyle [34]. Income is the single largest contributor of HH energy consumption in the U.S. [3]. Higher income groups usually have higher total energy consumption compared to lower income households [35]. Transport and non-transport HH energy use is positively associated with high income in different countries [36][37][38][39]. Education as one of the influencing factors complicates households' energy use behaviors. On the one hand, higher Educated households tend to adopt more energy efficient technologies and hence have less total energy expenditures [34,40] as well as less transportation related energy consumption [38]. At the same time they may own energy intensive items or commute longer for work [36,41] that will increase their total energy consumption [36,42]. However, Estiri [9] found that socioeconomic status (higher income, education, and better employment status) directly decreases total energy expenditures of U.S. households. The complex relationship between income, education, and energy use behavior may be the result of individuals' awareness of environmental issues [43].
HH size and age are other significant factors influencing HH energy consumption [11]. A larger number of HH members is associated with higher energy consumption for appliances, larger vehicles, and longer trips [27,41,44,45]. The age of householder is an indication of the life stage [27]. An increase in the age results in higher consumption because the elderly spend more time at home, have a higher income and a higher share of home ownership, and demand higher comfort [43,46]. Older age is also associated with longer non-commuting trips [41,47].

Urban Form and Housing Characteristics
There is an increasing awareness of the relationship between energy use and urban form/builtenvironment in existing literature, yet the empirical evidence is inconclusive [17]. Answers vary from cities with higher-density populations consuming lower levels of energy [48] to higher-density populations consuming higher levels of energy [49,50]. Dai et al. [51], however, argue that there are fundamental differences between urban and rural HH expenditure patterns: households living in urban settings with high density and mixed land-uses, and higher connectivity spend less on energy than households living in a geographic setting with lacks in such characteristics. Holden and Norland [52] model residential energy use for heating and travel using a HH survey data of eight areas in Oslo, Norway and suggest compact urban forms can improve energy efficiency and achieve sustainability. Ewing and Rong [16] indicate that an average U.S. HH in compact counties consumes

Socioeconomic Characteristics and Demographics
Occupant characteristics have been identified as one of the main factors affecting domestic energy use [11]. Socioeconomic and demographic characteristics of HH such as income, education, size, and age determine households' social class and their lifestyle [34]. Income is the single largest contributor of HH energy consumption in the U.S. [3]. Higher income groups usually have higher total energy consumption compared to lower income households [35]. Transport and non-transport HH energy use is positively associated with high income in different countries [36][37][38][39]. Education as one of the influencing factors complicates households' energy use behaviors. On the one hand, higher Educated households tend to adopt more energy efficient technologies and hence have less total energy expenditures [34,40] as well as less transportation related energy consumption [38]. At the same time they may own energy intensive items or commute longer for work [36,41] that will increase their total energy consumption [36,42]. However, Estiri [9] found that socioeconomic status (higher income, education, and better employment status) directly decreases total energy expenditures of U.S. households. The complex relationship between income, education, and energy use behavior may be the result of individuals' awareness of environmental issues [43].
HH size and age are other significant factors influencing HH energy consumption [11]. A larger number of HH members is associated with higher energy consumption for appliances, larger vehicles, and longer trips [27,41,44,45]. The age of householder is an indication of the life stage [27]. An increase in the age results in higher consumption because the elderly spend more time at home, have a higher income and a higher share of home ownership, and demand higher comfort [43,46]. Older age is also associated with longer non-commuting trips [41,47].

Urban Form and Housing Characteristics
There is an increasing awareness of the relationship between energy use and urban form/built-environment in existing literature, yet the empirical evidence is inconclusive [17]. Answers vary from cities with higher-density populations consuming lower levels of energy [48] to higher-density populations consuming higher levels of energy [49,50]. Dai et al. [51], however, argue that there are fundamental differences between urban and rural HH expenditure patterns: households living in urban settings with high density and mixed land-uses, and higher connectivity spend less on energy than households living in a geographic setting with lacks in such characteristics. Holden and Norland [52] model residential energy use for heating and travel using a HH survey data of eight areas in Oslo, Norway and suggest compact urban forms can improve energy efficiency and achieve sustainability. Ewing and Rong [16] indicate that an average U.S. HH in compact counties consumes 20% less energy compared to an average HH in low-density leapfrogging development counties. This is mostly due to the higher likelihood of living in a multifamily housing with less floor area in a compact county [16]. In these structures, many surfaces such as walls, floors, and ceilings are shared that may prevent any heat loss or gain.
The significant impact of housing characteristics on energy use has also been identified in the literature. Energy consumption not only increases by size of the house [37], but also based on the type of housing especially for attached or detached units [34]. Living in detached housings results in higher energy consumption due to larger space and more exposed walls [9,16,[53][54][55]. Similarly, an increase in the number of rooms in a building results in higher energy used in the building [34,39]. Age of building has been identified as another important factor influencing energy consumption. In recent decades much focus has been targeted on the housing stock for improving energy efficiency of buildings through technologies and policies [9,37,45,56]. Therefore, the modern and new homes are built with more energy efficient features [57]. However, research shows income is the single largest contributor of HH energy consumption [3] and America's more expensive houses are located in newly built low density suburban areas [58]. With more disposable income, these homeowners may be more frivolous with their energy consumption and long-distance commute.
The opposite seems to be true for older neighborhoods that enjoy the benefit of well-developed infrastructures and are conducive to alternative mode choices such as public transit, walking, and biking [58]. These neighborhoods have less vehicle mile travel (VMT) [41,59], and automobile ownership [60] resulting in less energy consumption [61][62][63]. Though the urban form impacts travel choices and hence energy use, socioeconomic factors are probably more important than built environment characteristics [64]. Despite the significant progress in understanding the relationship between the urban form and energy expenditures, further analysis is needed at microscale for more robust conclusions [17].

Temperature
The impact of climate change and temperature on residential energy consumption has been well documented [1,27,65,66]. There is a non-linear relationship between temperature and energy use [66,67] as electricity and gas demand increase for both low and high temperatures and decrease for Intermediate temperatures [67]. However, Zhou et al. [68] indicate that the socioeconomic variations in population are of equal or greater importance than climate in the U.S. state-level building energy demand. The mixed results might be due to various climate models, different choices of locations, and spatial and temporal resolutions [68].

Summary
While the importance of understanding the spatial dimension of energy consumption has been identified [20,21], none of the studies reviewed in Section 2 have examined locational variability of primary drivers of HH energy consumption. Understanding energy consumption of households is a complex issue that requires more comprehensive research at micro-scale with more sophisticated methodology that can capture such spatial variations [18,49,69]. A knowledge gap exists in the use of spatial methods to investigate household energy consumption. Therefore, this study utilizes a geographically weighted regression (GWR) approach to fill this gap.

Study Area
This study uses census tracts of the 14 metropolitan statistical areas (MSAs) in NC (Figure 2). The 2010 MSA boundary is used because sprawl index, which is used as an urban form variable in our analysis, is based on the 2010 census tract boundary. These MSAs include Asheville, Burlington, Charlotte-Gastonia-Rock Hill, Durham-Chapel Hill, Fayetteville, Greensboro-High Point, Goldsboro, Greenville, Hickory-Lenoir-Morganton, Jacksonville, Raleigh-Cary, Rocky Mount, Wilmington, and Winston-Salem. The study area has 1681 tracts, but only 1394 tracts are used for analysis after removing tracts with missing values. The transportation and residential are the largest energy consumption sectors (28.1% and 27.4% of all energy use sectors respectively) in NC [70]. The State's energy supply is mainly based on petroleum, coal, nuclear power, and natural gas and is depended on imports from other states [71]. NC is one of the fastest growing states [72] and is ranked 13th highest in carbon dioxide emissions in the US in 2010 [33]. A mix of dense and sparse landscapes makes NC a perfect place to study energy issues. Additionally, the researchers' familiarity with the state provides good working knowledge of the study area. analysis after removing tracts with missing values. The transportation and residential are the largest energy consumption sectors (28.1% and 27.4% of all energy use sectors respectively) in NC [70]. The State's energy supply is mainly based on petroleum, coal, nuclear power, and natural gas and is depended on imports from other states [71]. NC is one of the fastest growing states [72] and is ranked 13th highest in carbon dioxide emissions in the US in 2010 [33]. A mix of dense and sparse landscapes makes NC a perfect place to study energy issues. Additionally, the researchers' familiarity with the state provides good working knowledge of the study area.

Data Collection
A comprehensive consumer survey database from SimplyAnalytics [73]-a web-driven database with supporting data visualization-allows us to understand households' energy expenditures at census tract geography, the finest geographically scale energy expenditures data available at the time this research is conducted. SimplyAnalytics provides various kinds of data from the top-rated data surveying sites, including: Easy Analytic Software Inc. (EASI), Applied Geographic Solutions (AGS), Mediamark Research (MRI), Dun & Bradstreet (D&B), Nielsen, and Simmons Research [73]. The 2015 average energy expenditures data by households along with census tract geography shapefiles for 14 MSAs of NC are collected from this database. Average HH expenditures are separated into two major groups: utility and transportation and both are given in dollar values. The utility expenditures include households' average expenditures on both electricity and natural gas for our analysis. Transportation expenditures include vehicle purchase, vehicle leases and rentals, gasoline and oil and other expenses (e.g., maintenance and repair). Expenditure data for census tracts were also added to ArcGIS 10.4 to conduct hot spot analysis. The four factors and their main explanatory variables are presented in Table 1. The socioeconomic and demographic variables including HH median size, householder media age, HH median income, percent of households with education less than high school, and percent of households with bachelor degree are also obtained from SimplyAnalytics. Two housing variables collected from SimplyAnalytics are housing median year and percent of occupied housing with 1 unit detached. Percent of occupied housing units with 4 or more bedrooms is collected from the 2015 American Community Survey (ACS) 5-year estimate data [74].

Data Collection
A comprehensive consumer survey database from SimplyAnalytics [73]-a web-driven database with supporting data visualization-allows us to understand households' energy expenditures at census tract geography, the finest geographically scale energy expenditures data available at the time this research is conducted. SimplyAnalytics provides various kinds of data from the top-rated data surveying sites, including: Easy Analytic Software Inc. (EASI), Applied Geographic Solutions (AGS), Mediamark Research (MRI), Dun & Bradstreet (D&B), Nielsen, and Simmons Research [73]. The 2015 average energy expenditures data by households along with census tract geography shapefiles for 14 MSAs of NC are collected from this database. Average HH expenditures are separated into two major groups: utility and transportation and both are given in dollar values. The utility expenditures include households' average expenditures on both electricity and natural gas for our analysis. Transportation expenditures include vehicle purchase, vehicle leases and rentals, gasoline and oil and other expenses (e.g., maintenance and repair). Expenditure data for census tracts were also added to ArcGIS 10.4 to conduct hot spot analysis. The four factors and their main explanatory variables are presented in Table 1. The socioeconomic and demographic variables including HH median size, householder media age, HH median income, percent of households with education less than high school, and percent of households with bachelor degree are also obtained from SimplyAnalytics. Two housing variables collected from SimplyAnalytics are housing median year and percent of occupied housing with 1 unit detached. Percent of occupied housing units with 4 or more bedrooms is collected from the 2015 American Community Survey (ACS) 5-year estimate data [74].  To evaluate the impact of built environment on households' energy expenditures, sprawl index at census tract level was collected from National Cancer Institute [75]. The composite metric or sprawl index is developed by combining four multi-dimensional metrics including density, mixed, centering, and street accessibility to capture distinct dimensions of spatial urban form [75]. A set of geographic characteristics (e.g., population density, employment density) is used to calculate each factor. Overall scores are transformed with a mean of 100 and a standard deviation of 25 to create the sprawl index. The higher the number, the more compact the census tract is. The sprawl index is proven to be useful to evaluate travel impacts such as per-capita transportation energy consumption; quality of public transit; and energy efficiency [76]. Therefore, this index is selected for its reliability and validity. Distance from each census tract to the center of primary city of each MSA is used as another variable for assessing the impact of urban form on energy use. The length of the shortest path between each census tract and city center (represented by city hall) within each MSA was measured along the road network by using service area tool of ArcGIS Network Analyst. The road network was developed using 2017 data from North Carolina Department of Transportation (NCDOT) and South Carolina Department of Transportation (SCDOT) [77,78] because both of these organizations make road network data publicly available only for the current year. From our local experience, road network for NC and SC did not change much from 2015 to 2017, so our network distance from each census tract to city center variable should be consistent with our 2015 analysis. Additionally, use of 2017 road network data should not have much influence on the results since we did not focus on specific locations (e.g., homes, facilities). Age of housing is also being used as proxy measure for urban form [17,58,79], as older homes in urban areas are more likely to be in older core areas of cities (near city center) with higher density mix land uses, sidewalks, and interconnected streets networks [80]. Finally, the temperature data for the warmest and coldest months in 2015 for all counties of NC and three counties of SC are collected from Land-Based Station Data of NOAA [81]. ArcGIS 10.4 is used for Kriging interpolation to estimate temperature values across NC. The average temperature values of each census tract are calculated by the mean of raster cells within each census tract using zonal statistics.

Methodology: ANOVA, Hot Spot Analysis, and GWR
Both spatial and non-spatial analyses are conducted to investigate HH energy expenditures. The methodological procedure is shown in Figure 3. Analysis of variance (ANOVA) is performed using SPSS version 24 to identify whether there is a statistically significant difference in HH energy expenditures among the 14 MSAs. Levene statistic shows that the assumption of homogeneity of variances is violated for both transportation and utility expenditures, therefore Welch ANOVA is utilized. Multiple Comparison table in Games-Howell test is also performed to identify which of the specific groups differ in energy expenditures. The expenditures data for utility and transportation are then added to ArcGIS 10.4 to conduct hot spot analysis. We first perform Incremental Spatial Autocorrelation to identify the appropriate distance threshold values (34,068 m for utility expenditure and 49,294 m for transportation expenditure) for the hot spot analysis [82]. The Hot Spot Analysis (Getis-Ord Gi* statistic) tool is used to calculate the z-score and p-value for each census tract to show where the high or low energy expenditure values cluster. The larger the positive z-score, the more intense the clustering of high values (hot spot), and the smaller the negative z-score, the more intense the clustering of low values (cold spot).
We used geographically weighted regression (GWR) since the OLS method assumes spatial data are stationary and parameter estimations are constant across space [82]. This assumption is hard to meet in many circumstances [31]. Brunsdon et al. [31] developed a geographically weighted regression (GWR) model to account for spatial variations in estimating parameters. GWR extends the OLS model by allowing the coefficients to vary in different locations [31]. We first performed an ordinary regression model to find out the global effects of our explanatory variables on energy expenditures. OLS model also allows us to highlight the performance of the GWR model [31]. Adjusted R-Squared (R 2 ), Akaike Information Criteria (AICc), Joint F and Wald statistics, Jarque-Bera statistic (JB), Koenker (BP) statistic, Variance Inflation Factor (VIF), and Global Moran Index (Moran's I) are performed in ArcGIS 10.4 to get a properly specified OLS model [82].
Using GWR4 software, we first test the geographical variability of our explanatory variables with the GWR calibrated F-test and Akaike Information Criterion (AICc), where a positive AICc (DIFF of Criterion) suggests no spatial variability in terms of model selection criteria [83]. Therefore, the variable is better to be considered as global rather than local in the model [83]. Two out of seven explanatory variables (percent of households with education less than high school and percent of households with bachelor degree) in transportation model and three out of twelve variables (Sprawl, percent of households with education less than high school and mean January temperature) in utility model showed no spatial variability. Hence, a semiparametric GWR is performed applying an adaptive Gaussian kernel type and the Golden selection search function to select the optimal bandwidth size based on the AICc value [82]. The semiparametric model combines both geographically local and global fixed terms to improve its performance [82]. The adaptive kernel function is selected here due to the various sizes of census tracts [79]. The Gaussian model is selected to ensure there is sufficient local information to calibrate a local regression model [84]. AICc is a useful diagnostic to compare the goodness-of-fit for OLS and GWR models [85]. The overall fit performance and the possible improvements of GWR over OLS model are examined with an analysis of variance (ANOVA) in GWR4 software [86]. The spatial autocorrelation of residuals for both models are finally tested and compared using global Moran's I in ArcGIS 10.4 [82,87]. regression (GWR) model to account for spatial variations in estimating parameters. GWR extends the OLS model by allowing the coefficients to vary in different locations [31]. We first performed an ordinary regression model to find out the global effects of our explanatory variables on energy expenditures. OLS model also allows us to highlight the performance of the GWR model [31]. Adjusted R-Squared (R 2 ), Akaike Information Criteria (AICc), Joint F and Wald statistics, Jarque-Bera statistic (JB), Koenker (BP) statistic, Variance Inflation Factor (VIF), and Global Moran Index (Moran's I) are performed in ArcGIS 10.4 to get a properly specified OLS model [82].
Using GWR4 software, we first test the geographical variability of our explanatory variables with the GWR calibrated F-test and Akaike Information Criterion (AICc), where a positive AICc (DIFF of Criterion) suggests no spatial variability in terms of model selection criteria [83]. Therefore, the variable is better to be considered as global rather than local in the model [83]. Two out of seven explanatory variables (percent of households with education less than high school and percent of households with bachelor degree) in transportation model and three out of twelve variables (Sprawl, percent of households with education less than high school and mean January temperature) in utility model showed no spatial variability. Hence, a semiparametric GWR is performed applying an adaptive Gaussian kernel type and the Golden selection search function to select the optimal bandwidth size based on the AICc value [82]. The semiparametric model combines both geographically local and global fixed terms to improve its performance [82]. The adaptive kernel function is selected here due to the various sizes of census tracts [79]. The Gaussian model is selected to ensure there is sufficient local information to calibrate a local regression model [84]. AICc is a useful diagnostic to compare the goodness-of-fit for OLS and GWR models [85]. The overall fit performance and the possible improvements of GWR over OLS model are examined with an analysis of variance (ANOVA) in GWR4 software [86]. The spatial autocorrelation of residuals for both models are finally tested and compared using global Moran's I in ArcGIS 10.4 [82,87].

Average Household Energy Expenditures in the MSAs
The average HH energy expenditures both in utility (electricity and natural gas) and transportation (excluding public transportation) in NC are nearly $2000 and $9000 per year respectively, which are more like that of U.S. HH average spending [88]. Transportation accounts for almost 80% of HH energy expenditures in all MSAs in NC (Figure 4)

Average Household Energy Expenditures in the MSAs
The average HH energy expenditures both in utility (electricity and natural gas) and transportation (excluding public transportation) in NC are nearly $2000 and $9000 per year respectively, which are more like that of U.S. HH average spending [88]. Transportation accounts for almost 80% of HH energy expenditures in all MSAs in NC (Figure 4)

Hot and Cold Spots
The hot spot analysis shows the cluster of census tracts with high and low energy expenditures ( Figure 5). Significant hot spots for HH transportation expenditures are found in Raleigh and

Hot and Cold Spots
The hot spot analysis shows the cluster of census tracts with high and low energy expenditures ( Figure 5). Significant hot spots for HH transportation expenditures are found in Raleigh and Durham-Chapel Hill MSAs (Figure 5a). Significant cold spots for HH transportation expenditures are observed in Greensboro-High Point MSA, eastern parts of Alamance County in Burlington MSA, Fayetteville, Goldsboro, and Greenville MSAs. The significant hot spots for utility expenditures are found in Wake and Franklin counties in Raleigh MSA, areas close to Durham downtown in Durham-Chapel Hill MSA, and Brunswick County in Wilmington MSA (Figure 5b). Significant cold spots for utility expenditure are located in Fayetteville, Jacksonville, and Asheville MSAs.

OLS Model
The model estimates for transportation and utility are presented in Tables 2 and 3. The models explain approximately 41% (Adj. R 2 = 0.41) variation for transportation (Table 2) and 12% (Adj. R 2 = 0.12) variation for utility expenditures (Table 3) in the explanatory variables. The adjusted R 2 is a recalibration of the R 2 to prevent the artificial increase of R 2 when more variables are added in a model [90]. The Joint F-statistic and Joint Wald statistic are used to assess the model significance [90]. The Joint F-statistic is only used when the Koenker (BP) test is not significant [90]. Here, the Koenker (BP) statistic is significant, therefore the Joint Wald statistic shows an overall model significance. In addition, the statistically significant Koenker (BP) test highlights the presence of heteroscedasticity and/or non-stationarity making the transportation and utility models good candidates for GWR analysis [82]. The statistically significant Jarque-Bera statistic indicates that the residuals are not normally distributed [82]. This is additionally confirmed by Moran's I test.

OLS Model
The model estimates for transportation and utility are presented in Tables 2 and 3. The models explain approximately 41% (Adj. R 2 = 0.41) variation for transportation (Table 2) and 12% (Adj. R 2 = 0.12) variation for utility expenditures (Table 3) in the explanatory variables. The adjusted R 2 is a recalibration of the R 2 to prevent the artificial increase of R 2 when more variables are added in a model [90]. The Joint F-statistic and Joint Wald statistic are used to assess the model significance [90]. The Joint F-statistic is only used when the Koenker (BP) test is not significant [90]. Here, the Koenker (BP) statistic is significant, therefore the Joint Wald statistic shows an overall model significance. In addition, the statistically significant Koenker (BP) test highlights the presence of heteroscedasticity and/or non-stationarity making the transportation and utility models good candidates for GWR analysis [82]. The statistically significant Jarque-Bera statistic indicates that the residuals are not normally distributed [82]. This is additionally confirmed by Moran's I test.  The transportation OLS model includes all the socioeconomic-demographic and urban form variables listed in Table 1. All of selected variables except HH median size are significant at the 0.01 level (Table 2). HH median income is identified as the most significant variable contributing to the model. With every $100 increase in HH median income in a census tract, there is $3 increase in the HH transport expenditures. Transportation expenditures are also increased with an increase in householder median age and college education. Percent of households with less than high school education in a census tract is associated with lower transportation expenditures. The relationship between both urban form variables and transportation expenditures is statistically significant suggesting that households living in more compact areas and closer to city center of primary city spend less on transportation (Table 2). While all identified variables in Table 1 are entered in the utility model, the low adjusted R 2 (0.14) ( Table 3) shows that none of the independent variables are strong predictors of utility expenditures. HH median income, HH median age, and percentage of occupied housing with one unit detached are the only significant explanatory variables in the OLS model. Any increase in these variables will increase HH utility expenditures. The VIF values of explanatory variables for both transportation and utility model are less than the critical value of 7.5 indicating that there are no multicollinearity problems between the predictors [82,90].

OLS Model vs. Semiparametric GWR Model
Significant Moran's I test (MI = 0.005 p = 0.04) for transportation expenditures highlights the existence of spatial autocorrelation in residuals (Table 2). Although Moran's I test is not significant for utility expenditures (Table 3), we perform the GWR analysis for both transportation and utility to examine the spatial variability of explanatory variables. The GWR results are shown in Table 4. A positive DIFF of Criterion in the GWR results, mainly greater than or equal to two, suggests that the local variable is better to be considered as global. The GWR calibrated F-test and the AICc values in the GWR4 software for both transportation and utility confirm that four variables (sprawl, distance from city center, HH median income, householder median age) in transportation model and three variables (percent of housings with one unit detached, HH median income, householder median age) in utility model exhibit statistically significant geographic variability representing by negative DIFF of Criterion and t-values greater than two (p < 0.05) ( Table 5). Although there is no change in the signs of the regression coefficients in both the OLS and the semiparametric GWR models, adjusted R 2 is one percent better compared to OLS model for both transportation and utility (Table 4). AICc value of the semiparametric GWR decreased by 5.34 and 6.21 in transportation and utility models respectively ( Table 4). The simulated residual of the GWR models are also less than that of the global models for transportation and utility. Accordingly, Moran's I test of GWR model shows a smaller value for transportation (MI = 0.002; p = 0.3) suggesting more random patterns in returned residuals [82] (Table 4). The mean values of the local regression coefficients of explanatory variables for both transportation and utility models have the same signs observed in the OLS models (Table 6). ANOVA tests for transportation (F = 2.67) ( Table 7) and utility (F = 2.2) (Table 8) reveal that the predictions improve by applying the semiparametric GWR model.

GWR Local Estimates
GWR modeling allows for visualizing the spatial patterns of explanatory variables and the goodness of fit [84]. The local estimated coefficients and the local R 2 s are mapped and shown in Figure 6. The t-values are mapped only when the variables' coefficients are not significant (t < 2) in some parts of the study area. The local goodness of fit shows spatial differentiation, with values from 0.37 to 0.49 for transportation and 0.12 to 0.17 for utility ( Figure 6). Higher values are observed in the east part of study area for both models reflecting that GWR can successfully characterize spatial non-stationarity [82].

GWR Local Estimates
GWR modeling allows for visualizing the spatial patterns of explanatory variables and the goodness of fit [84]. The local estimated coefficients and the local R 2 s are mapped and shown in Figure  6. The t-values are mapped only when the variables' coefficients are not significant (t < 2) in some parts of the study area. The local goodness of fit shows spatial differentiation, with values from 0.37 to 0.49 for transportation and 0.12 to 0.17 for utility ( Figure 6). Higher values are observed in the east part of study area for both models reflecting that GWR can successfully characterize spatial nonstationarity [82]. Sprawl local estimates have a higher significant influence on transportation expenditures in the west part of study area particularly in Winston-Salem, Hickory-Lenoir-Morganton, and northern part of Charlotte MSAs suggesting that with every increase in the sprawl score (higher compactness), transportation expenditures decrease in larger values ($19.92-$21.97) compared to the rest of the study area (Figure 7a). Distance from city center has the highest effect on transportation expenditures in Hickory-Lenoir-Morganton, Charlotte-Gastonia-Rock Hill, Asheville, Fayetteville, and Wilmington MSAs. The lowest influence of this variable is observed in Winston-Salem, Greensboro-High Point, and west of Raleigh MSAs (Figure 7b). Sprawl local estimates have a higher significant influence on transportation expenditures in the west part of study area particularly in Winston-Salem, Hickory-Lenoir-Morganton, and northern part of Charlotte MSAs suggesting that with every increase in the sprawl score (higher compactness), transportation expenditures decrease in larger values ($19.92-$21.97) compared to the rest of the study area (Figure 7a). Distance from city center has the highest effect on transportation expenditures in Hickory-Lenoir-Morganton, Charlotte-Gastonia-Rock Hill, Asheville, Fayetteville, and Wilmington MSAs. The lowest influence of this variable is observed in Winston-Salem, Greensboro-High Point, and west of Raleigh MSAs (Figure 7b).  Although there is not a large difference among the local estimates of HH median income, an east-west pattern can be seen for the positive impact of HH median income on both transportation and utility expenditures (Figure 8). With an increase in householder median age, transportation expenditures increase in a larger amount ($40.73-$43.54) in Winston-Salem, Greensboro-High Point, Burlington, and east of Charlotte-Gastonia-Rock Hill MSAs (Figure 9). Householder median age has a lower positive influence on utility expenditures compared to transportation expenditures. However, its increase will result in households spending more on utility in the east part of study area (Figure 10a). While the lowest impacts of householder median age on utility are observed in Charlotte-Gastonia-Rock Hill, Hickory-Lenoir-Morganton, and Asheville MSAs, the results are not statistically significant ( Figure  10b). The impact of percentage of housing with one unit detached on utility expenditures also varies spatially with the highest influence observed in Winston-Salem, Hickory-Lenoir-Morganton, and Charlotte-Gastonia-Rock Hill, and Asheville MSAs. With every percent increase of housings with one unit detached in these MSAs, average HH utility expenditures increase from $324.90 to $347.41 ( Figure 11). With an increase in householder median age, transportation expenditures increase in a larger amount ($40.73-$43.54) in Winston-Salem, Greensboro-High Point, Burlington, and east of Charlotte-Gastonia-Rock Hill MSAs (Figure 9). Householder median age has a lower positive influence on utility expenditures compared to transportation expenditures. However, its increase will result in households spending more on utility in the east part of study area (Figure 10a). While the lowest impacts of householder median age on utility are observed in Charlotte-Gastonia-Rock Hill, Hickory-Lenoir-Morganton, and Asheville MSAs, the results are not statistically significant (Figure 10b). The impact of percentage of housing with one unit detached on utility expenditures also varies spatially with the highest influence observed in Winston-Salem, Hickory-Lenoir-Morganton, and Charlotte-Gastonia-Rock Hill, and Asheville MSAs. With every percent increase of housings with one unit detached in these MSAs, average HH utility expenditures increase from $324.90 to $347.41 ( Figure 11). With an increase in householder median age, transportation expenditures increase in a larger amount ($40.73-$43.54) in Winston-Salem, Greensboro-High Point, Burlington, and east of Charlotte-Gastonia-Rock Hill MSAs (Figure 9). Householder median age has a lower positive influence on utility expenditures compared to transportation expenditures. However, its increase will result in households spending more on utility in the east part of study area (Figure 10a). While the lowest impacts of householder median age on utility are observed in Charlotte-Gastonia-Rock Hill, Hickory-Lenoir-Morganton, and Asheville MSAs, the results are not statistically significant ( Figure  10b). The impact of percentage of housing with one unit detached on utility expenditures also varies spatially with the highest influence observed in Winston-Salem, Hickory-Lenoir-Morganton, and Charlotte-Gastonia-Rock Hill, and Asheville MSAs. With every percent increase of housings with one unit detached in these MSAs, average HH utility expenditures increase from $324.90 to $347.41 ( Figure 11).

Discussion
Various factors have been examined in the literature to understand energy consumption using different regression analyses [44,91]. We used geographically weighted regression (GWR) method to study spatial heterogeneity in energy expenditures of households in NC. In global models, noncomplete dataset with missing information might result in spatial heterogeneity [92]. Since complete datasets are difficult to obtain in many studies, applying local models to include spatial information can significantly improve the prediction results [90]. However, the problem of multicollinearity should be taken into consideration when interpreting the results. Although the explanatory variables are not collinear in this study, it should be noted that the lack of this problem in the global model does not guarantee coefficient independency in the GWR model [93].
Socioeconomic and demographic characteristics of households, urban form, housing characteristics, and local temperature are used to understand the complexity of energy consumption. As expected, OLS models show that higher income households spend more on both transportation and utilities, as richer households are interested to purchase larger houses, more goods and services, luxurious vehicles, comfortable indoor environments and recreational activities, which all result in

Discussion
Various factors have been examined in the literature to understand energy consumption using different regression analyses [44,91]. We used geographically weighted regression (GWR) method to study spatial heterogeneity in energy expenditures of households in NC. In global models, noncomplete dataset with missing information might result in spatial heterogeneity [92]. Since complete datasets are difficult to obtain in many studies, applying local models to include spatial information can significantly improve the prediction results [90]. However, the problem of multicollinearity should be taken into consideration when interpreting the results. Although the explanatory variables are not collinear in this study, it should be noted that the lack of this problem in the global model does not guarantee coefficient independency in the GWR model [93].
Socioeconomic and demographic characteristics of households, urban form, housing characteristics, and local temperature are used to understand the complexity of energy consumption. As expected, OLS models show that higher income households spend more on both transportation and utilities, as richer households are interested to purchase larger houses, more goods and services, luxurious vehicles, comfortable indoor environments and recreational activities, which all result in  Figure 11. Local coefficients of percent of occupied housing with one unit detached for utility model.

Discussion
Various factors have been examined in the literature to understand energy consumption using different regression analyses [44,91]. We used geographically weighted regression (GWR) method to study spatial heterogeneity in energy expenditures of households in NC. In global models, non-complete dataset with missing information might result in spatial heterogeneity [92].
Since complete datasets are difficult to obtain in many studies, applying local models to include spatial information can significantly improve the prediction results [90]. However, the problem of multicollinearity should be taken into consideration when interpreting the results. Although the explanatory variables are not collinear in this study, it should be noted that the lack of this problem in the global model does not guarantee coefficient independency in the GWR model [93].
Socioeconomic and demographic characteristics of households, urban form, housing characteristics, and local temperature are used to understand the complexity of energy consumption.
As expected, OLS models show that higher income households spend more on both transportation and utilities, as richer households are interested to purchase larger houses, more goods and services, luxurious vehicles, comfortable indoor environments and recreational activities, which all result in higher energy consumption [69]. Higher education is associated with higher transportation expenditures that might be due to higher income, higher car ownership, and longer drive for commuting purpose [36,41]. Sprawl and distance from the primary city center are significant factors influencing transportation expenditures in the OLS model. This is in line with the literature suggesting the positive impact of compact development on reducing transportation consumption through transportation choice (public vs. private), automobile dependency, and vehicle miles travel (VMT) [48,52,[94][95][96]. Contrary to expectations, the urban form variables were not significant predictors of HH utility expenditures in NC. Although none of the urban form factors are significant, the negative sign of sprawl indicates the lower utility expenditures spent by households living in compact areas. Housing age is also not a statistically significant factor related to utility expenditures. However, the positive correlation between housing age and energy expenditures, showing that new houses consume more energy, is in contrast with previous findings [34,97]. These results might be due to the larger size of new houses that are usually located in the less compact census tracts outside the city center.
Furthermore, the percent of detached housing has a positive impact on HH utility expenditures as it has been found in the past studies [9,[53][54][55]. In addition to loosing or gaining heat in a detached home for more exposed walls, higher income households are also more likely to live in detached houses with large spaces and more rooms, all of which may suggest their higher utility consumption [9]. Higher percentage of detached housings in our study area are located outside the city center and in the areas that are representative of less compact urban forms. However, the measurement of urban form, in our case sprawl index, has remained a challenging issue for studying utility consumption. The attributes such as accessibility to green spaces, shading effects from the trees, and water bodies that may affect utility consumption [14,17] are not considered in this index. Additionally, the benefits of density on utility or transportation expenditures can be superseded by other factors such as lifestyle choices of individuals [50,95,98].
The GWR models in this research have a slightly better performance than the OLS models and provide local variations in relationships between our explanatory variables and HH energy expenditures. HH median income, householder median age, sprawl index, and distance from the primary city center explain spatial variability of HH transportation expenditures. HH median income, householder median age, and percent of housing with one unit detached are the main factors in the GWR model of utility expenditures. While the OLS results show a global trend of the impact of explanatory variables on energy expenditures, the GWR model indicates the spatial variation of the influence. For example, the OLS model shows that a census tract would have higher transportation expenditures if it is either far away from the city center or in more sprawl areas. The GWR results show where the influence of sprawl and distance from city center are higher in specific MSAs and census tracts. Identifying these local variations can be the most effective way of suggesting to urban planners where to allocate future land uses and changes in urban form to minimize HH transportation expenditures. Urban planners and policy makers can also target housing stock to reduce residential energy consumption with a transition from detached housing to attached or multifamily housing [16]. The reduction in energy consumption would be higher in the areas where GWR shows a greater relationship between housing type and HH utility expenditures illustrating by larger local coefficients. The GWR model can also offer an opportunity to see if the insignificant global parameters will show locally significant influence [84] though this was not the case for our analysis.
Both OLS and GWR modeling for HH transportation expenditures have a better fit compared to the utility model suggesting additional data requirements for the future analysis. The influence of non-economic factors such as information, attention, individual attitudes, social norms, and lifestyle on HH energy expenditures has been identified in different countries [10,[99][100][101]. These variables are not included in our models because these types of data are not publicly available and are needed to be collected. The homogenous nature of NC might be the other reason that the GWR models did not provide strong results. A big advantage of the semiparametric GWR model is that it does not assume the spatial non-stationarity for all variables and allows for both global and local variables to be included in the model. However, GWR results cannot easily be transferred to other places. This is a disadvantage of GWR compared to OLS model that can be applied in similar physical settings [82]. Our aim here, however, was to capture the spatial variation of energy consumption in NC and location-specific impacts of contributing factors. The GWR model, therefore, can be a useful tool for urban planners and decision makers to develop local plans and policies to reduce and manage HH energy consumption. Socioeconomic-demographic, housing and urban form characteristics do not generally change in a short period of time and are easily available through different data sources in the U.S., thus providing adequate information for local decision makers to implement the GWR approach for other areas. Though the results may vary in different locations, the GWR modeling has been proved to be an effective method in past studies and our research for prioritizing the suitable strategies. The GWR model can be improved and extended into an optimization-modeling framework [84] that would help to solve spatial-social problems related to energy consumption. The outputs of the GWR model could serve as inputs to the optimization model [84]. For instance, an optimization model could be developed to allocate future land-use and buildings in a way to minimize energy consumption.
From a policy perspective, the general assumptions that certain policies are applicable in all parts of a country or a state need to be revisited. The results of this study show that the effects of socioeconomics and demographics, urban form, and housing characteristics on household energy consumption are different across the region. This suggests the importance of developing location-specific guidelines for decision makers in order to evaluate the consumption patterns in local areas and prioritize the suitable strategies for different areas. In addition, much focus in the U.S. has been targeted on the housing stock for improving energy efficiency of buildings through technologies. The results of this study, however, show the significant impact of socioeconomic and demographic factors and housing type on electricity and gas consumption that need to be addressed in the future policies. Local policies also need to address more compact developments in their approach to reduce transport-related energy consumption.

Conclusions
A spatial analytical approach was developed to study HH energy consumption in 14 MSAs of NC using geographically weighted regression (GWR) modeling at census tract geography. The estimations of the OLS and GWR models were presented to investigate the global and local effects of various explanatory factors on HH transportation and utility expenditures. The findings reveal the spatial variation of the relationship between energy expenditures and the influencing factors. To the best of our knowledge, this research is the first to have applied the GWR modeling in HH energy expenditures at census tract level in the U.S. Although there are no changes in the signs of the regression coefficients in both the OLS and the semiparametric GWR models, the explanatory power of the GWR had a slight increase characterized by higher adjusted R 2 , reduction of AICc, and smaller residual values.
The main contribution of this research is evaluating the influence of a range of factors in HH energy expenditures at census tract level using a spatial approach. For designing intervention aimed at changing the future land use plans for the cities across the world, the micro scale analysis is crucial. The global energy demand has raised many concerns across the world particularly for the U.S. that has the highest demand among countries [1]. A complete assessment of households' energy consumption and its relationships to spatial urban structure need to be provided to help engineers and planners in their approach toward sustainability. The major limitation in this research is difficulty in collecting individual data that might be one of the reasons our models did not have a strong explanatory power particularly for the utility expenditures. Similarly, sprawl index does not have additional attributes (e.g., green area and tree shading) of urban form. Our housing characteristics data also have limitations such as missing information on housing shape, materials being used and ratio of window-wall. There is no complete set of data that represent such housing characteristics in census tract geography. Impact of individuals' attitude and life style choice on energy consumption should be evaluated through detailed HH surveys. Although obtaining such data for a state such as NC is practically impossible, it should be noted that technology or planning alone might not encourage people to decrease their consumptions. Another limitation is the homogenous nature of NC. Applying our GWR model to other states might provide stronger and more reliable results. Developing policies to address these issues, improving the goodness of fit of GWR model, and extending the model to optimization frameworks remain some areas for future research.
Author Contributions: S.S. originally came up with the idea and designed the conceptual framework of the research with N.P. and H.K.; N.P. primarily collected and analyzed the data and wrote the original draft; H.K. contributed to the data acquisition and analysis; S.S. provided advices on results and modified the manuscript.