Downscaling Building Energy Consumption Carbon Emissions by Machine Learning

: The rapid rate of urbanization is causing increasing annual urban energy usage, dras-tic energy shortages, and pollution. Building operational energy consumption carbon emissions (BECCE) account for a substantial proportion of greenhouse gas emissions, crucially inﬂuencing global warming and the sustainability of urban socioeconomic development. As a foundation of building energy conservation, determination of reﬁned statistics of BECCE is attracting increasing attention. However, reliable and accurate representation of BECCE remains lacking. This study proposed an innovative downscaling method to generate a gridded BECCE intensity benchmark dataset with 1 km 2 spatial resolution. First, we calculated BECCE at the provincial level by energy balance table application. Second, on the basis of building climate demarcation, partial least squares regression models were used to establish the BECCE behavior equations for three climate regions. Third, Cubist regression models were built, retrieving down scale at the prefecture level to 1 km 2 BECCE, which well-captured the complex relationships between BECCE and multisource covariates (i.e., gross domestic product, population, ground surface temperature, heating degree days, and cooling degree days). The downscaled product was veriﬁed using anthropogenic heat ﬂux mapping at the same resolution. In comparison with other published pixel-based datasets of building energy usage, the gridded BECCE intensity map produced in this study showed good agreement and high spatial heterogeneity. This new BECCE intensity dataset could serve as a fundamental database for studies on building energy conservation and forecast carbon emissions, and could support decision makers in developing strategies for realizing the CO 2 emission peak and carbon neutralization.


Introduction
With rapid development of urbanization globally, greenhouse gases associated with the urban metabolism continue to be emitted in huge quantities, substantially influencing global warming and sustainable urban socioeconomic development [1][2][3]. In 2018, building whole lifecycle energy usage emitted 9.7 Gt CO 2 , accounting for 39% of total energy-related CO 2 emissions globally [4]. It is predicted that the current rapid rate of development will continue to drive energy consumption and carbon emissions even higher in the future, increasing even greater urgency to the need to address climate change [5]. In this context, some countries' national greenhouse gas emission reduction targets have been proposed. For China, the targets aim to reach the CO 2 emission peak by 2030 and achieve carbon neutralization by 2060, with the ultimate objective of achieving net-zero CO 2 emissions. BECCE and the urban environment, and restrict interdisciplinary studies using integrated environmental and social datasets in raster or gridded formats. Thus, development of an efficient approach for estimation of BECCE at the pixel level that could be easily integrated with other spatial data has become a new primary research interest of vital importance.
To address these gaps, this study attempts to propose a new machine learning-based downscaling approach to improve refined BECCE estimation. The flexible partial least squares (PLS) regression models and Cubist regression models were constructed to disaggregate statistical BECCE at different administrative levels into higher-resolution spatial units respectively, through capturing the relationships between varied ancillary geospatial layers and BECCE. In order to clearly demonstrate this novel approach, we generated an accurate gridded BECCE intensity benchmark dataset at 1 km 2 spatial resolution over China in 2015. To the best of our knowledge, this study represents the first attempt to use a machine learning-based downscaling method that incorporates socioeconomic and geospatial data to estimate pixel-level BECCE. This new BECCE intensity dataset could serve as a fundamental database for studies on building energy conservation and forecast carbon emissions, and could support decision makers in developing strategies for realizing the CO 2 emission peak and carbon neutralization.

Methodology
Accurate and reliable representation of fine-scale BECCE generally remains lacking; however, such information is in high demand for studies over large areas, especially in rapidly developing countries such as China. One feasible and practical approach to resolving this problem is to fuse downscaled data with fine-scale ancillary information, which is based on the idea of establishing relationships between a coarse-scale target variable and fine-scale auxiliary covariates [29][30][31]. Therefore, this study adopted a downscaling approach comprising four steps: (1) calculate provincial-level BECCE from different sources, (2) select predictors for BECCE mapping, (3) establish PLS regression models at the provincial level, and (4) build Cubist regression models at the prefecture level ( Figure 1).
Due to the current research status mentioned above, the approach for high-spatialresolution BECCE estimation over large areas remains lacking. Such coarse data obtained by the aforementioned methods are short of spatial heterogeneity and lacking in details and accuracy over large areas, which hinder understanding of the relationship between BECCE and the urban environment, and restrict interdisciplinary studies using integrated environmental and social datasets in raster or gridded formats. Thus, development of an efficient approach for estimation of BECCE at the pixel level that could be easily integrated with other spatial data has become a new primary research interest of vital importance.
To address these gaps, this study attempts to propose a new machine learning-based downscaling approach to improve refined BECCE estimation. The flexible partial least squares (PLS) regression models and Cubist regression models were constructed to disaggregate statistical BECCE at different administrative levels into higher-resolution spatial units respectively, through capturing the relationships between varied ancillary geospatial layers and BECCE. In order to clearly demonstrate this novel approach, we generated an accurate gridded BECCE intensity benchmark dataset at 1 km 2 spatial resolution over China in 2015. To the best of our knowledge, this study represents the first attempt to use a machine learning-based downscaling method that incorporates socioeconomic and geospatial data to estimate pixel-level BECCE. This new BECCE intensity dataset could serve as a fundamental database for studies on building energy conservation and forecast carbon emissions, and could support decision makers in developing strategies for realizing the CO2 emission peak and carbon neutralization.

Methodology
Accurate and reliable representation of fine-scale BECCE generally remains lacking; however, such information is in high demand for studies over large areas, especially in rapidly developing countries such as China. One feasible and practical approach to resolving this problem is to fuse downscaled data with fine-scale ancillary information, which is based on the idea of establishing relationships between a coarse-scale target variable and fine-scale auxiliary covariates [29][30][31]. Therefore, this study adopted a downscaling approach comprising four steps: (1) calculate provincial-level BECCE from different sources, (2) select predictors for BECCE mapping, (3) establish PLS regression models at the provincial level, and (4) build Cubist regression models at the prefecture level ( Figure 1).

Calculating Provincial-Level BECCE from Different Sources
We first calculate provincial-level BECCE based on the IPCC Guidelines for National Greenhouse Gas Inventories. In mainland China, BECCE derive primarily from primary energy, heating power, and electric power [32]. In this study, we considered five energy types: coal (including raw coal, cleaned coal, other washed coal, briquettes, coke, and coke oven gas), petroleum products (including gasoline, diesel oil, and LPG), natural gas, heat, and electricity [33]. The composition of provincial-level BECCE can be formulated as follows: CE EN = CF COAL + CF PP + CF NG +CF HEAT + CF EL (1) where CF EN is the provincial BECCE, CF COAL , CF PP , and CF NG are the CO 2 emissions from provincial use of coal, petroleum products, and natural gas respectively, and CF HEAT and CF EL represent the CO 2 emissions from provincial consumption of heat and electricity, respectively.
To facilitate calculation, all types of consumed energy were converted to standard coal equivalents. The total CO 2 emissions from a certain energy type can be expressed as follows: where C ei is the total provincial CO 2 emissions from energy type i (i.e., coal, petroleum products, natural gas, heat, or electricity), W ej is the provincial final amount of consumption of energy type j based on the China Energy Statistical Yearbook, G ej is the conversion factor of energy type j consumption, which is the ratio of the final consumption of energy type j to standard coal consumption, I ej is the conversion factor of energy type j CO 2 emissions, which is the amount of CO 2 emissions per unit of standard coal consumption, and n is the total number of energy types (i = 1, 2, . . . , n) considered in this study.

Selection of Predictors for BECCE Mapping
In implementation of a downscaling method, to obtain a model with maximum predictability and effective transferability, it is necessary to screen variables guided by prior knowledge to evaluate their relevance and importance. Our previous study using factor analysis showed that building construction characteristics, socioeconomic environment, macroclimate conditions, and microclimate conditions have the greatest impact on BECCE in China [34,35]. Therefore, in this study, a suite of typical covariates was selected to represent these four factors in the PLS and Cubist regression models. Gridded population (POP) information, which was considered as the energy consumption membership to determine the CO 2 emissions of a building in terms of lighting, cooking, heating, cooling, and ventilation, was used as the indicator of building characteristics [36,37]. Gross domestic product (GDP) was used to represent the socioeconomic environment [38,39]. Outdoor macroclimate conditions that substantially influence building energy usage can be represented by heating degree days (HDD18) and cooling degree days (CDD26), which were considered as indicators of the building energy consumption required for heating and cooling to maintain a comfortable indoor temperature [40]. The microclimate is generally defined as the local climate within 1 km 2 of a building, which was represented by meteorological factors such as air humidity, solar radiation, and wind speed in some previous studies on building energy consumption [41,42]. In this study, we used the 0 cm ground surface temperature (GST) as the indicator of the microclimate environment.

Building Provincial-Level PLS Regression Models
PLS regression is a new multivariate statistical analysis model first proposed by Wold and Albano in 1983 [43]. It builds a linear regression model via data dimension reduction, information synthesis, and screening technology to extract new comprehensive components with optimal interpretation of the system [44]. The general underlying models of multivariate PLS regression can be expressed as follows: where X is an n × m matrix of predictors, and Y is an n × p matrix of responses, T and U are n × l matrices that are projections of X and Y respectively, P and Q are m × l and p × l orthogonal loading matrices respectively, and E and F are matrices of the error terms, which are assumed to be independent and identically distributed random normal variables.
Here, X and Y are decomposed to maximize the covariance between T and U. The PLS regression algorithm integrates the advantages of canonical correlation analysis, principal component analysis, and multiple linear regression analysis, making it applicable for a matrix of predictors that has more variables than observations and when multicollinearity exists among the independent variables [45]. In this study, provinces in China were initially divided into three groups on the basis of their regional characteristics (shown in the map of China in Figure 2, marked with different colors). Then, provincial-level PLS regression models were built for each group to capture the different associations between the various source variables and the target BECCE profile. The basis of the provincial grouping was China's building climate demarcation (GB50176-93) (Ministry of Housing and Urban-Rural Development, 1993) [46], in which the energy-saving designs of buildings have notable differences. The groups comprised region I (northern part, most provinces in the Severe Cold Zone or Cold Zone), region II (central part, most provinces in the Hot Summer and Cold Winter Zone), and region III (southern part, most provinces in the Hot Summer and Warm Winter Zone or Mild Zone). Taking the above factor into consideration, for the covariates representing macroclimate conditions, HDD18 was selected as the indicator for region I, CDD26 was selected as the indicator for region III, and both HDD18 and CDD26 were selected as indicators for region II. We aggregated all the 1 km 2 values of the predictors to the provincial level, and calculated the GDP per square kilometer, POP per square kilometer, average GST, average HDD18, and average CDD26, which were coordinated with calculated provincial-level BECCE intensity values (i.e., total BECCE of the province divided by provincial area). These PLS regression models representing the provincial-level relationships between BECCE intensity and related covariates were then applied to the corresponding covariates at the prefecture level to obtain primary estimates of prefectural-level BECCE intensity. The following step was to use the derived BECCE intensity estimations as the weighting layer for a standard dasymetric mapping approach to disaggregate the provincial-level BECCE to the final prefectural-level values [47,48].

Building Cubist Regression Models
Cubist regression, which was proposed based on the ideas of Quinlan [49,50], is a rule-based method applicable to multivariate linear regression. It operates by constructing

Building Cubist Regression Models
Cubist regression, which was proposed based on the ideas of Quinlan [49,50], is a rule-based method applicable to multivariate linear regression. It operates by constructing intermediate linear models defined by sets of rules at each step of a model tree, then building a most suitable prediction based on the linear regression model at the terminal node of the tree, which is "smoothed" by taking the prediction from the former models into consideration, and subsequently, reducing the model tree to a set of rules via pruning and/or combined for simplification [51]. The general model of Cubist regression formed by two linear models can be expressed as follows: where ζ(c) is the prediction from the current model, and ζ(p) is the prediction from the linear model in the previous node of the tree. As a deep-learning algorithm, the main advantages of the Cubist method are to add multiple training committee models to improve the predictive accuracy and to deal with nonlinear and complex multivariate relationships [52]. Simultaneously, it can be used to rank the relative importance of variables, which helps with model interpretation. Recently, the Cubist model has been proven to be a viable regression method and successfully used in various fields [53][54][55]. Thus, this study used Cubist regression models to generate gridded BECCE intensity estimates that were subsequently used for dasymetric disaggregation of prefectural-level BECCE data into gridded cells with 1 km 2 spatial resolution, following the same principles as used in the downscaling approach of PLS regression fitting. This process has been applied successfully to Cubist-based dasymetric anthropogenic heat emission mapping by Chen et al. [28]. The Cubist models were implemented using the Cubist Package in the R environment.

Data and Preprocessing
For clear demonstration of the proposed methodology, this study adopted mainland China as the study area ( Figure 2). China is a country in East Asia, officially divided into 34 provincial administrative regions. In this study, Hong Kong, Macao, and Taiwan were excluded because their political and economic status differ from that of mainland China. Moreover, Tibet was not included because it does not have statistical data on energy consumption (these non-study areas are shown in the map of China in Figure 2, labeled "No data"). The BECCE data of 30 provinces used in the downscaling approach were collected from China's Energy Statistical Yearbook for 2016 (http://www.stats.gov.cn/ accessed on 21 June 2017). The total final energy consumption needed for the BECCE calculation incorporated the items of wholesale and retail trade, hotels and restaurants, others, and residential consumption (urban and rural), extracted from the regional energy balance tables of this yearbook.
We used a series of socioeconomic, remote sensing, and geospatial datasets related to BECCE to build a set of covariates for fitting the PLS and Cubist regression models. Information regarding each covariate was extracted according to the corresponding grid at 1 km 2 resolution using ArcGIS 10.7. The projection coordinates of all these grids were unified and their boundary data were completed to China's borders through Kriging.
A number of ancillary variables were also included in the analysis. Gridded GDP and POP distribution maps for 2015 were acquired from the GDPGrid_China dataset and the PopulationGrid_China dataset for mainland China at 1 km 2 spatial resolution respectively, published by the Data Center for Resources and Environmental Sciences of the Chinese Academy of Sciences (http://www.resdc.cn/ accessed on 11 December 2017). These two datasets were generated through building spatial correlation models with a map of land use and land cover change derived from Landsat TM imagery and other ancillary information. The resultant socioeconomic datasets have reasonably high accuracy at the county level. The meteorological data used included GST, HDD18, and CDD26. Heating degree day and cooling degree day indices are frequently used indicators of the heating and cooling energy requirements of buildings. Daily HDD18 and CDD26 can be calculated using the basic formulas expressed in Equations (6) and (7), respectively [56]: where T is the average air temperature of a day. Daily HDD18 and CDD26 can be accumulated over a year. Raw meteorological data for 2015 were derived from daily GST and air temperature datasets provided by the China Meteorological Administration (http://data.cma.cn/ accessed on 4 August 2017). After processing to obtain the average GST of the entire year and the yearly accumulated HDD18 and CDD26 values of each meteorological station, the gridded GST map at 1 km 2 resolution was constructed using ANUSPLIN software, with consideration of a digital elevation model, and the HDD18 and CDD26 maps were constructed through Kriging using ArcGIS.

Spatial Distribution of the Five Covariates
The five covariates (i.e., GDP, POP, GST, HDD18, and CDD26) selected for the downscaling showed high spatial heterogeneity among cities, and exhibited notable differences in the three climate regions (Figure 3). .81 • C·day, respectively. The average value of heating degree days in region I was more than 2 and 5 times higher than that in regions II and III, respectively. The average value of cooling degree days in region III was nearly 3 and 2 times higher than that in regions I and II, respectively.

Energy Usage and Carbon Emissions at the Provincial Level from Energy-Balance Calculation
In 2015, the total amount of BECCE in China was 2060 Mt CO 2 , with an average value of 6.87 × 10 7 tons of CO 2 at the provincial level. Among the 30 provinces, Shandong, Hebei, and Heilongjiang provinces ranked as the top three with regard to BECCE, with values of 1.55 × 10 8 , 1.49 × 10 8 , and 1.48 × 10 8 tons of CO 2 , respectively. The lowest value was 7.16 × 10 6 tons of CO 2 in Hainan Province (Figure 4). In terms of energy structure, the BECCE from electric power, heating power, and primary energy accounted for 43.48%, 27.68%, and 28.84% (raw coal accounting for 16.08%) respectively, of the total energy CO 2 emissions. Emissions of CO 2 from electricity represented the major proportion of BECCE, highlighting the need for ongoing work regarding the green low-carbon transformation in China (Figure 5a). The BECCE of residential buildings was 1.45 times higher than that of public buildings (Figure 5b). ℃•day. The average value of GST in regions Ⅰ, Ⅱ, and Ⅲ was 8.98, 16.31, and 20.81 ℃•day, respectively. The average value of heating degree days in region Ⅰ was more than 2 and 5 times higher than that in regions Ⅱ and Ⅲ, respectively. The average value of cooling degree days in region Ⅲ was nearly 3 and 2 times higher than that in regions Ⅰ and Ⅱ, respectively.

Energy Usage and Carbon Emissions at the Provincial Level from Energy-Balance Calculation
In 2015, the total amount of BECCE in China was 2060 Mt CO2, with an average value of 6.87 × 10 7 tons of CO2 at the provincial level. Among the 30 provinces, Shandong, Hebei, and Heilongjiang provinces ranked as the top three with regard to BECCE, with values of 1.55 × 10 8 , 1.49 × 10 8 , and 1.48 × 10 8 tons of CO2, respectively. The lowest value was 7.16 × 10 6 tons of CO2 in Hainan Province (Figure 4). In terms of energy structure, the BECCE from electric power, heating power, and primary energy accounted for 43.48%, 27.68%, and 28.84% (raw coal accounting for 16.08%) respectively, of the total energy CO2 emissions. Emissions of CO2 from electricity represented the major proportion of BECCE, highlighting the need for ongoing work regarding the green low-carbon transformation in China (Figure 5a). The BECCE of residential buildings was 1.45 times higher than that of public buildings (Figure 5b).      Table 1 lists the PLS regression models constructed for the three groups that have adjusted R 2 values of 0.963, 0.961, and 0.804 for regions Ⅰ, Ⅱ, and Ⅲ respectively, indicating that the basic models had sufficient explanatory power. On the basis of the regression analysis, we determined that all the relationships between BECCE intensity and the covariates in each region of China were positive. For region II, the importance of the effect of the covariates on BECCE in decreasing order was POP > GDP > HDD > CDD > GST. The order was similar for region Ⅲ, except regarding consideration of HDD. High values of POP and GDP, e.g., as in central and southern China, directly reflect regional prosperity, which has a substantial impact on promoting BECCE. This finding is consistent with the result of Shi et al. [57], who found that GDP and population were important positive driving forces on BECCE at the prefecture level in China. For region I, HDD ranked as the covariate with the most important effect on BECCE, followed in decreasing order by GDP, POP, and GST. The macroclimate was the major factor regarding BECCE in the Severe Cold Zone and Cold Zone owing to the considerable building energy consumption required for heating to maintain a comfortable indoor temperature in winter. Therefore, HDD had the greatest impact on BECCE in region Ⅰ, with a coefficient more than 2 and 8 times higher than that of GDP and POP, respectively. We also found that the controlling effect of the microclimate, represented by GST, made the smallest contribution to BECCE in all three regions. Nevertheless, according to our previous study, the microclimate  Table 1 lists the PLS regression models constructed for the three groups that have adjusted R 2 values of 0.963, 0.961, and 0.804 for regions I, II, and III respectively, indicating that the basic models had sufficient explanatory power. On the basis of the regression analysis, we determined that all the relationships between BECCE intensity and the covariates in each region of China were positive. For region II, the importance of the effect of the covariates on BECCE in decreasing order was POP > GDP > HDD > CDD > GST. The order was similar for region III, except regarding consideration of HDD. High values of POP and GDP, e.g., as in central and southern China, directly reflect regional prosperity, which has a substantial impact on promoting BECCE. This finding is consistent with the result of Shi et al. [57], who found that GDP and population were important positive driving forces on BECCE at the prefecture level in China. For region I, HDD ranked as the covariate with the most important effect on BECCE, followed in decreasing order by GDP, POP, and GST. The macroclimate was the major factor regarding BECCE in the Severe Cold Zone and Cold Zone owing to the considerable building energy consumption required for heating to maintain a comfortable indoor temperature in winter. Therefore, HDD had the greatest impact on BECCE in region I, with a coefficient more than 2 and 8 times higher than that of GDP and POP, respectively. We also found that the controlling effect of the microclimate, represented by GST, made the smallest contribution to BECCE in all three regions. Nevertheless, according to our previous study, the microclimate around buildings should not be ignored [42]. This is because it can be improved reasonably easily for an investment level lower than that required for renovation of building characteristics or enhancement of socioeconomic conditions or the macroclimate. The estimated prefectural-level BECCE is shown in Figure 6. The average value of prefectural-level BECCE was 5.11 × 10 6 tons of CO 2 , and most cities had an intermediate value of BECCE (i.e., 1-8 × 10 6 tons of CO 2 ). The highest value of BECCE (4.96 × 10 7 tons of CO 2 ) was in Harbin, followed in decreasing order by Shenyang, Dalian, and Shijiazhuang. The value of BECCE of Kunyu (Xinjiang Province) was the lowest, i.e., 2.64 × 10 4 tons of CO 2 . Generally, cities with high values of BECCE are located in provincial capitals with a prosperous economy and large population, and this is especially the case in northern China because of the need for heating in winter and the large volume of industry. value of BECCE (i.e., 1-8 × 10 6 tons of CO2). The highest value of BECCE (4.96 × 10 7 tons of CO2) was in Harbin, followed in decreasing order by Shenyang, Dalian, and Shijiazhuang. The value of BECCE of Kunyu (Xinjiang Province) was the lowest, i.e., 2.64 × 10 4 tons of CO2. Generally, cities with high values of BECCE are located in provincial capitals with a prosperous economy and large population, and this is especially the case in northern China because of the need for heating in winter and the large volume of industry.

Pixel-Based Distribution of BECCE Intensity
Through joint use of multisource socioeconomic and environmental data in the Cubist regression downscaling approach, a gridded map of BECCE intensity for mainland China in 2015 was created with a spatial resolution of 1 km 2 (Figure 7). The "Hu Huanyong Line", which is a geographic demarcation line proposed by the famous geographer Hu Huanyong in 1935, that stretches from Heihe in northeastern China to Tengchong in southwestern China and shows the distinct spatial characteristics of the Chinese population, also marks a substantial difference in the BECCE intensity distribution across mainland China. The eastern side of the Hu Huanyong Line accounted for 88.81% of the BECCE of the entire study area, indicating the highly heterogeneous levels of urbanization and environmental pollution in China. Moreover, the pixel values of BECCE intensity varied considerably from 0.001 to 193.105 CO2 kg•m −2 , with nearly 80% below the average value of 0.25 CO2 kg•m −2 , showing the high level of heterogeneity. The pixels with high values

Pixel-Based Distribution of BECCE Intensity
Through joint use of multisource socioeconomic and environmental data in the Cubist regression downscaling approach, a gridded map of BECCE intensity for mainland China in 2015 was created with a spatial resolution of 1 km 2 (Figure 7). The "Hu Huanyong Line", which is a geographic demarcation line proposed by the famous geographer Hu Huanyong in 1935, that stretches from Heihe in northeastern China to Tengchong in southwestern China and shows the distinct spatial characteristics of the Chinese population, also marks a substantial difference in the BECCE intensity distribution across mainland China. The eastern side of the Hu Huanyong Line accounted for 88.81% of the BECCE of the entire study area, indicating the highly heterogeneous levels of urbanization and environmental pollution in China. Moreover, the pixel values of BECCE intensity varied considerably from 0.001 to 193.105 CO 2 kg·m −2 , with nearly 80% below the average value of 0.25 CO 2 kg·m −2 , showing the high level of heterogeneity. The pixels with high values of BECCE intensity were located mainly in major urban centers. Furthermore, high BECCE intensity was concentrated in urban areas of dynamically economic and prosperous regions such as Beijing, Tianjin, and Shanghai, with average BECCE intensity in the range of 16.49-57.12 CO 2 kg·m −2 across downtown areas. In most cases, in comparison with peripheral cities, provincial capitals had higher BECCE intensity. However, builtup areas in some medium-or small-sized cities such as Shihezi (Xinjiang Province) had the highest average city value of BECCE intensity of up to 7.44 CO 2 kg·m −2 , probably attributable to extreme climate conditions in small spatial areas. For estimation of fine-scale BECCE intensity, GDP and POP, representing the socioeconomic conditions and energy consumption membership respectively, had the greatest influence ( Figure 8). In comparison with the results of the PLS regression models, GST was shown to have a greater impact on BECCE intensity in region I at 1 km 2 spatial resolution, which is consistent with the finding where there were differences on the importance of the driving factors on BECCE due to China's different spatial classification [58], further indicating that its effect in the microenvironment should not be ignored in relation to building energy conservation. son with the results of the PLS regression models, GST was shown to have a greater impact on BECCE intensity in region Ⅰ at 1 km 2 spatial resolution, which is consistent with the finding where there were differences on the importance of the driving factors on BECCE due to China's different spatial classification [58], further indicating that its effect in the microenvironment should not be ignored in relation to building energy conservation.   able to extreme climate conditions in small spatial areas. For estimation of fine-scale BECCE intensity, GDP and POP, representing the socioeconomic conditions and energy consumption membership respectively, had the greatest influence ( Figure 8). In comparison with the results of the PLS regression models, GST was shown to have a greater impact on BECCE intensity in region Ⅰ at 1 km 2 spatial resolution, which is consistent with the finding where there were differences on the importance of the driving factors on BECCE due to China's different spatial classification [58], further indicating that its effect in the microenvironment should not be ignored in relation to building energy conservation.

Comparative Assessment of Accuracy
A per-pixel evaluation of the BECCE intensity maps was impossible owing to the lack of reference building energy consumption and BECCE data at the grid level (such information is scarce even at the prefecture level). Therefore, for a comparative assessment of accuracy, it is practical and reasonable to find other pixel-based data that are directly related to building energy usage. In this study, we compared the predicted BECCE intensity at the grid cell level with gridded anthropogenic heat flux (AHF b ) that spatially represents the anthropogenic heat emission from buildings. Anthropogenic heat emission, which is the emission of anthropogenic waste heat into the urban land-atmosphere system driven by the energy consumption associated with human activities, plays an important role in urban climate and environment studies. Previous related studies have shown that anthropogenic heat emissions arise from industrial processing, heating and cooling of buildings, vehicle exhausts, and human metabolism [59,60]. Although there is high demand for reliable and accurate representation of anthropogenic heat emission across China, it is difficult to estimate at the regional scale because of limited data. Recently, Chen et al. [28] published a gridded AHF b benchmark dataset at 1 km 2 spatial resolution for China in 2010 using the same principle of the Cubist-based downscaling approach with fusion off points-of-interest data of buildings and multisource remote sensing data. Based on the above, Pearson correlation analysis was conducted to explore the relationship between the estimated BECCE intensity and AHF b , which were processed to the same pixel position, normalized, and finally extracted using a 1 km 2 fishnet. Table 2 shows the results of the Pearson correlation analysis of the predicted BECCE intensity and the AHF b for the entire study area, region I, region II, and region III. For the entire study area, BECCE intensity was correlated positively with AHF b (R = 0.620, p = 0.01), and the strength of the correlation in region I was higher than that in regions II and III.

Advantages of the Method
Downscaling approaches have been commonly applied to create accurate spatially explicit products in various fields (e.g., gridded population, GDP, electronic power consumption). In this study, crucially important elements were selected. PLS was applied for constructing BECCE driving mechanism behavior equations because it could effectively offset the collinear contribution among driving factors. Meanwhile, the Cubist algorithm proved its advantage for fine-scale data training and was used to refine our result. Comparing with Random Forest regression models, BECCE intensity data in urban centers of some large metropolitan areas retrieved by Cubist regression models was within the range of 16.49-57.12 CO 2 kg·m −2 , which is more scientific than the 0.002-7.438 CO 2 kg·m −2 retrieved by the Random Forest method.

Comparison of Normalized BECCE Intensity and AHF b for Eight Metropolitan Cities in China
As a further demonstration of the relationship between BECCE intensity and AHF b , and to provide another form of accuracy assessment, Figure 9 shows the predicted BECCE intensity and AHF b in eight metropolitan cities: Beijing, Shanghai, Guangzhou, Wuhan, Xiamen, Chengdu, Harbin, and Urmqi. Owing to the different times of data collection, the analysis focuses on the relative change of values in different regions of the cities. In comparing BECCE intensity with AHF b , we found two types of spatial pattern highly replicated in all eight cities. First, areas with high BECCE intensity were distributed mainly in central cities, which generally have high values of AHF b . Second, high BECCE intensity and AHF b occurred mainly in dense residential and commercial areas in urban centers, exhibiting the characteristic of agglomeration. For example, evident high BECCE intensity was found in the Dongcheng and Xicheng districts of Beijing and the Jingan and Huangpu districts of Shanghai, which are well-developed urban areas with high-density buildings and prosperous businesses. These results showed that the proposed methodology is effective in capturing the main spatial characteristics of BECCE intensity and producing a reasonable spatial distribution of BECCE intensity. The findings demonstrated that BECCE intensity has a close relationship with AHF b in China, and that the method of estimation of BECCE intensity is reliable and accurate.
was found in the Dongcheng and Xicheng districts of Beijing and the Jingan and Huangpu districts of Shanghai, which are well-developed urban areas with high-density buildings and prosperous businesses. These results showed that the proposed methodology is effective in capturing the main spatial characteristics of BECCE intensity and producing a reasonable spatial distribution of BECCE intensity. The findings demonstrated that BECCE intensity has a close relationship with AHFb in China, and that the method of estimation of BECCE intensity is reliable and accurate.

Limitations
The downscaling approaches suffer from an inherent limitation that they lack consideration of the variation of the statistical data on the same type of parcel due to time lag. At the same time, each of these input ancillary geospatial layers introduces its own uncertainty because of its inherent errors in location or in attribution. Perhaps most important, however, is that significant uncertainties arise from the difficulties in attempting to quantify and assess the link between the pixel-level BECCE and the ancillary layers, as the relations are typically uncertain and sensitive to local contextual factors, such as human behaviors, environmental consciousness, local fuel and electricity prices, air temperature change, and the vagaries of random chance. In addition, the BECCE distribution under different land form and land use patterns varies from urban core areas to suburban or rural areas, which has not been taken into consideration in this study.

Conclusions
In the context of rapid urbanization globally, urban energy consumption is rising annually. The continuous growth of BECCE restricts urban sustainable development, resulting in increasing energy shortages and energy-related pollution. As a foundation of building energy conservation, the determination of refined statistics of BECCE is attracting increasing attention. This study produced an innovative updated and comprehensive perspective of BECCE to satisfy the growing demand for a national database of BECCE profiles and to fill certain knowledge gaps. Based on the principle of downscaling, we proposed a novel method for gridded BECCE mapping that integrates socioeconomic and remote sensing data in flexible PLS and Cubist regression models, which were constructed to capture the complex relationships between covariates (i.e., GDP, POP, GST, HDD18, and CDD26) and BECCE intensity at the provincial and prefecture levels, respectively. We successfully generated a reasonable and accurate dataset of the estimated pixel-based BECCE intensity for China, which is closely related to human activities, with 1 km 2 spatial resolution.
The newly developed gridded BECCE intensity dataset with high spatial heterogeneity could serve as a fundamental database for further studies on building energy conservation. The established dataset could also be used to further explore the relationships between BECCE intensity and socioeconomic factors and the urban environment, which would contribute to improved sustainable urban development. Moreover, the developed modeling technique could be applied to forecasting carbon emissions and studies on the CO 2 emission peak and carbon neutralization. In this study, there is no distinction between different land use and land cover change in their downscaling approach. In the future, a more refined study area (the urban built-up land) will be focused on to estimate the downscaling-based BECCE distribution in the urban land use area, and to further explore the relationship with the BECCE distribution and urban three-dimensional buildings [61,62].

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.