Simulating the Spatial Heterogeneity of Housing Prices in Wuhan, China, by Regionally Geographically Weighted Regression

: Geographically weighted regression (GWR) is an effective method for detecting spatial non-stationary features based on the hypothesis of proximity correlation. In reality, especially in the social and economic ﬁelds, research objects not only have spatial non-stationary characteristics, but also spatial discrete heterogeneity characteristics. Therefore, how to improve the accuracy of GWR estimation in this case is worth studying. In this paper, a regionally geographically weighted regression (RGWR) is proposed. Using incoming dummy variables, the zoning discrimination is added to the spatial kernel function of GWR, the spatial kernel function is modiﬁed, the spatial weight is optimized, and the inﬂuence of “near heterogeneous” observation points is reduced. In this paper, the residential sale price in Wuhan City is taken as an example in the analysis of three aspects: model performance, ﬁtting effect and inﬂuencing factors. The results show that the introduction of a zoning dummy variables can signiﬁcantly improve the model accuracy of a ﬁxed bandwidth and adaptive bandwidth. Under a ﬁxed bandwidth, compared with the GWR model, RGWR increases R 2 and R 2 adj from 0.6776 and 0.6732 to 0.777 and 0.7746, respectively, and the Akaike information criterion, corrected (AICc) standard decreases by 37.4006 compared with GWR, which proves the effectiveness of the method.


Introduction
In early studies of spatial regression, models were applied at the global level, where data relationships were constant throughout the whole study region.However, this spatial stability hypothesis is usually invalid for non-stationary spaces, which generally manifest as uncontrolled spatial variability [1].To account for this spatial heterogeneity, spatial statistics changed from global matching to a local model, and many local regression techniques were proposed.Casetti proposed the expansion method [2], Jone proposed the multilevel models [

3], McMillen and McDonald proposed nonparametric local linear regression [4,5],
Elhorst proposed the panel data model [6], and Brunsdon and Fotheringham proposed geographic weighted regression (GWR) [7][8][9].GWR is built on the premise of Tobler's famous first law of geography, that "everything is related to everything else, but near things are more related than distant things" [10], effectively resolving spatial non-stationarity and detecting spatial heterogeneity.Many studies have focused on extending GWR to better detect spatial heterogeneity.Lu applied the non-Euclidean distance metric to the GWR model solution [11,12], Anselin L proposed heteroskedastic GWR [13], Harris P proposed robust GWR [14], Wang N proposed local linear estimation-based GWR [15], and Zhao proposed a GWR Based on semi-supervised learning [16].These studies improve different aspects of model-solving accuracy, and scenarios such as a flexible distance metric, adaptation to heteroskedasticity, outliers, covariance, and Semi-Supervised Learning.They also make it easier to detect heterogeneous features of spatial relationships due to their irregular distribution in geographic space [17].
Spatial heterogeneity is one important feature in geographical phenomena, and needs to be considered when quantitatively modeling the relationship between a response variable and explanatory variables in spatial data analysis [18].The specification of spatial heterogeneity can be classified into continuous heterogeneity and discrete heterogeneity [19].The former specifies how the regression coefficients change over space, either following a predetermined functional form, or as determined by the data through a local estimation process [20].The latter consists of a prespecified set of spatially distinct units, or spatial regions, such as administrative units, differences in the population densities in different areas, climates or ecological zones, and the distribution of soil types, land use, and land cover [21,22].Model coefficients and other parameters are also allowed to vary among the regions.GWR is a major paradigm for spatial modeling to reveal spatial continuous heterogeneity.Existing GWR models can detect spatial continuous heterogeneity to some extent in practical applications by means of bandwidth optimization, but cannot solve the discrete heterogeneity [23].
In solving the problem of spatial discrete heterogeneity, the existing research on spatial analysis has conducted some explorations.Rich Harris pointed out that two points can be geographically close but socially distant because the contexts (or neighborhoods) within which they are situated are not alike [24].This is slightly similar to spatial discrete heterogeneity.They proposed a contextualize GWR (CGWR) by incorporating contextual information in the GWR weights' matrix.However, the decay rate of spatial distance may be inconsistent with that of context variables, and spatial regions are not considered.In the early regression analysis, in order to detect spatial discrete heterogeneity, scholars used sub-area regress, which divided the research area into multiple different regions according to certain indicators, and then established a regression model in each region [25].Some scholars used dummy variables to represent regions, and different regions are given different dummy variable values [26,27].These effectively reflect the difference between regions, while a global regression in the same region is still unable to detect non-stationary spatial features.
The hedonic price model (HPM) is a widely used approach to studying housing prices [28,29].It establishes a quantitative relationship between characteristics and housing prices.Such models regard houses as a composite commodity formed by structural attributes, neighborhood attributes, age of construction and other attributes.Structural attributes determine the basic functions of a house and have a large impact on its price.The housing area, age, the number of bedrooms, the residential plot ratio, residential greening ratio, and other factors are considered as structural attributes [30,31].Neighborhood attributes reflect the accessibility of amenities and the socioeconomic status of communities.The influence of supermarkets, shopping centers, primary schools, and bus stations contribute greatly to explaining housing prices [32,33].However, the essence of the HPM is consistent in the whole space.Since GWR and geographically and temporally weighted regression (GTWR) were used for real-estate research, the non-stationary nature of housing prices has been widely revealed [9,34].An increasing number of studies have explored the non-stationary nature of housing prices from various perspectives [35][36][37][38].In fact, housing prices not only have the characteristics of spatial non-stationary, but also spatial discrete heterogeneity.For example, in the context of the "nearby enrollment" policy of compulsory education in China, enrollment in primary and secondary schools is strictly in accordance with the school district where the child's registered residence is located.Many parents are, hence, willing to pay high housing prices for good schools, thereby leading to a soaring price of "school district housing" at present [39].This phenomenon exists not only in China, but also in other countries and regions.According to the data of the UK Department of Education, the average price of performing schools' district housing is more than 18,600 GBP higher than that of non-school-district housing in the whole England.The average house price in the whole of England in 2016 was 233,000 pounds.House prices near the 10% best-performing primary schools are 8.0% higher than that in the surrounding area.Near the 10% best-performing non-selective secondary schools, house prices are 6.8% higher [40].Therefore, spatial discrete heterogeneity has a widespread existence in the field of house prices.Although the spatial attribute of housing prices is generally considered when constructing a hedonic price model, few studies have simultaneously investigated spatial non-stationary features and spatial discrete heterogeneity in depth.
The objective of this article is to extend the traditional GWR model to simultaneously detect the spatial non-stationary features and spatial discrete heterogeneity in housing prices.This study seeks to contribute to the literature on the topic in the following three ways.First, we propose regionally geographically weighted regression (RGWR), which uses dummy variables to reflect regional differences.Second, the algorithm flow and estimation steps of RGWR are illustrated.Third, we examine and compare the GWR and RGWR for modeling housing prices by means of a case study in the Wuhan city of China.
This paper is organized as follows: the RGWR methods are derived in Section 2; in Section 3, we carry out an experiment to assess the performance of the proposed method and make an empirical comparison with the basic GWR.Finally, the discussion and conclusions of the paper are reported.

Geographical Weight Regression
Geographical weight regression is an extension of ordinary linear regression.It is used to explore spatial non-stationary features, embed geographic location into regression parameters, and allow each individual point in a different location to have a different value to estimate regression parameters [9].The model can be expressed as: where (u i , v i ) is the coordinate of the i-th sampling point and β k (u i , v i ) is the k-th regression parameter at the i-th sampling point.When estimating the regression parameters of sampling point i, the importance of observations at different observation points is not the same.The closer the observation point is to point i, the higher the importance; the farther the observation point is, the lower the importance.Used the local weighted least squares approach, the regression parameter estimation βi at point i is given by Formula (2): The spatial weight matrix W i is the core of the geographically weighted regression model; W i is an n*n matrix, which is calculated by the monotonously decreasing function of the geographic distance d ij between the regression point i and other observation points j.The elements on the off-diagonal line of the matrix are zero, and the elements on the diagonal line represent the geographical weight between the regression point i and the observation point j, namely, W i = diag(w i1 , w i2 , . . ., w in ).
The selection of the spatial kernel function is of great importance to the correct estimation of the parameters in GWR.The general spatial kernel function is of two types: the fixed kernel function and the adaptive kernel function.For the fixed kernel, distance is constant, but the number of nearest neighbors varies.For the adaptive kernel, distance varies but the number of neighbors remains constant [34].The most common kernel is a Gaussian distance decay-based function.Its function form is as follows: where b is called bandwidth, which is a non-negative attenuation parameter calculated by the cross-validation (CV) approach [9].The larger the bandwidth, the slower the weight decays as the distance increases, and the smaller the bandwidth, the faster the weight decays as the distance increases.

Regionally and Geographically Weighted Regression
RGWR is an extension of GWR used to explore spatial non-stationary and spatially discrete heterogeneity by adding regional dummy variables to GWR that embed geographic location into the regression parameters, allowing each individual point to have different values to estimate the regression parameters, also affected by regional factors for each regression point.The model can be expressed as: where (u i , v i ) is the coordinate of the i-th sampling point and Rβ k (u i , v i ) is the k-th re- gression parameter of the i-th sampling point.In estimating the regression parameters of sampling point i, the regional factors are added on the basis of GWR's "the closer the observation point is to point i, the higher the importance, and vice versa [9]", so that when the observation point is outside the regional range, the point does not participate in the regression.Using the locally weighted least squares method, the regression parameter estimate Rβ i at point i is given by Equation (5): The geographic weight between regression point i and observation point j is RW i = diag(rw i1 , rw i2 , . . ., rw in ).rw ij is the spatial kernel function of RGWR, and the solution of the spatial kernel function of RGWR involves the selection of RGWR bandwidth.

Spatial Kernel Function Calculation
GWR is essentially a partial regression, that is, it uses observation points within the bandwidth of the regression point to estimate the value of the regression point.The spatial weight effectively describes the observation point's degree of influence on the regression point as the distance changes.Sometimes, the distribution of observation points in the study area is not uniform.To ensure that a certain number of observation points participate in the fitting during the estimation of each regression point, Fotheringham proposed a fixed nuclear bandwidth and an adaptive bandwidth strategy [9].In the fixed bandwidth strategy, the global bandwidth is a fixed value, and the number of observation points involved in the calculation is different when each regression point is estimated.Figure 1a shows a schematic diagram of the fixed bandwidth kernels.The most common adaptive bandwidth strategy is to fix the number of adjacent observation points; that is, take the number of observation points participating in the estimation of each regression point as a fixed value in the global scope, then the bandwidth will change with the change in the regression point.Figure 1b shows a schematic diagram of the adaptive bandwidth kernels.The former is suitable for a sample set with a relatively uniform spatial distribution, and the latter is suitable for a sample set with uneven spatial distribution.
The weight calculation strategy of GWR does not consider the regional factor.To better characterize the role of regional factors in the calculation of spatial weights, we assume that observation points located in the same region have the same region attributes, and observation points located in different regions have different regional attributes.Then, it is better to use homogeneous observation points for fitting.Therefore, when calculating the weight, the region attribution judgment is introduced; that is, when the observation point and the regression point are in the same region, the observation point participates in regression point estimation.When the observation point and the regression point are in different regions, the observation point does not participate in the regression point estimation.Figure 1c shows a schematic diagram of the RGWR with fixed bandwidth kernels.Figure 1d shows a schematic diagram of the RGWR with adaptive spatial kernels.Regionally geographically weighted regression adds the measurement factor of spatial regions to the traditional method of geographical weighted regression.In the weight function design, priority is given to the influence of zoning factors and then the influence of neighboring points.Corresponding to the Gauss kernel function and the bi-square kernel function, the expression of the kernel function based on the zoning is proposed.
The RGWR of the Gauss kernel function as: where b is the bandwidth, d ij is the distance between point i and point j, and δ is the introduced dummy variable: the regional influence factor.When i and j are located in the same region, δ = 1, which is the traditional GWR.When i and j are located in different regions, δ = 0, which means that rw ij = 0, which is the RGWR.
In addition, the bi-square kernel function is also a commonly used weight calculation method for GWR.RGWR of the bi-square kernel function is given as:

Bandwidth Selection
Bandwidth selection adopts Akaike information criterion, corrected (AICc) [9].The optimal bandwidth is selected through trials: In each trial, a bandwidth is selected, RGWR is fitted using the bandwidth, and then a goodness-of-fit measure such as AICc is calculated, where AICc is defined by: where σ is the estimated standard deviation of the error term and tr(S) is the trace of the hat matrix S. The optimal bandwidth is that which minimizes AICc.

Algorithm Process
The algorithmic flow of RGWR is given in Figure 2. RGWR estimation is divided into two parts: one is the selection of the optimal bandwidth and the other is the parameter estimation, i.e., estimation of regression coefficients, fitted values, and evaluation metrics.The data include independent variables, dependent variables, spatial location variables, alternative bandwidths, and regional impact factors.The step-by-step process is as follows: 1.
Data initialization.Set the range of bandwidth values.

3.
Build the GWR model using independent variables, dependent variables, spatial location variables, and bandwidths.4.
Construct a spatial kernel function for each observation using spatial location variables and bandwidth.5.
Calculate the spatial weight matrix for each observation point.6.
Calculate the Hat Matrix by using the independent variables, dependent variables, and spatial weight matrices.7.
Calculate the AICc values of the models corresponding to this set of bandwidths.8.
Select the parameters of the model corresponding to the minimum AICc value, which is the optimal bandwidth.9.
Build the RGWR model using the independent variables, dependent variables, spatial location variables, and the optimal bandwidth.10.Construct a regional geographically weighted, spatial kernel function using spatial location variables, bandwidth, and regional factors.11.Calculate the regional geographically weighted spatial weight matrix for each observation.12. Calculate model regression coefficients, fitted values, and evaluation metrics by using the independent variables, dependent variables, and spatial weight matrices.

Experiment Analysis
In this section, the research area and related research data were selected and preprocessed; then, GWR and RGWR models were used to estimate the data under different bandwidth strategies.Then, depending on the estimation results, the model performances of GWR and RGWR under different bandwidth strategies were compared.The next step is to compare the fitting effects of GWR and RGWR under different bandwidth strategies.Finally, the main factors affecting the price of commercial housing in Wuhan were analyzed.

Study Area and Data
Wuhan is the capital of Hubei Province in central China, and serves as its political, economic, and cultural center.Wuhan consists of thirteen administrative regions: Jangan, Janghan, Qiaokou, Hanyang, Wuchang, Qingshan, Hongshan, Dongxihu, Hannan, Caidian, Jangxia, Huangpi, and Xinzhou.The housing market is one of the most active markets in China and plays a crucial role in China's economy [41][42][43][44][45].The real estate prices of Wuhan are increasing at an alarming rate, associated with rapid industrialization and urbanization and consequent demands for various categories of real estate.A spatial heterogeneity analysis of real estate prices is considered crucial for revealing major issues in real estate market development, understanding effective strategies of economic macro control, and promoting the high-quality development of internal economics [46][47][48][49].This paper uses the listed residential sale prices of Wuhan City, Hubei Province, China as characteristic price data, constructs a characteristic price model [30, [50][51][52][53][54], and conducts an experimental analysis.The characteristic price model is used to describe the quantitative relationship between house characteristics and housing prices.Research has found that the price of commercial houses is related to factors such as house structure, surroundings, geographic location, and construction time [55].House structure includes factors such as the indoor area and construction time, and the surroundings, such as the plot ratio, greening rate, and the distance from elementary schools and shopping malls.This paper collected 954 communities in the urban area of Wuhan as sampling points and obtained the average listing price (yuan/square meter) and construction time of each community in December 2019.Their geographic locations are shown in Figure 3.At the same time, data on points of interest were collected in Wuhan, such as data on bus stations, subway stations, hospitals, parks, shopping malls, primary schools, middle schools, universities, fire protection, public security, and so on.Among them, characteristic house price data are from Anjuke (https://www.anjuke.com/)(lastaccessed on 29 June 2020), and the point of interest data are from Gaode (https://www.gaode.com/)(lastaccessed on 1 March 2021).In addition, the base map data of Wuhan's administrative area, including the main roads and waters, came from the China Map Publishing House, and the list of provincial model schools in Wuhan came from the Hubei Provincial Department of Education.The numerical ranking of each district is shown in Table 1.

Data Preprocessing
We extracted administrative regionalization, structural, neighborhood, and temporal variables to explain the variations in house prices.Before building the model, we preprocessed the data, using a logarithm operation on continuous variables.Overlay analysis was used to obtain the regional relationship between each residential plot and the administrative regions of Wuhan.Finally, we used a multicollinearity analysis and stepwise regression analysis to determine the independent variable factors.
An overview of the variables involved in housing prices is given in Table 2.The dependent variable is the sale price of the house.Unit prices are calculated in RMB.The structural characteristics of each house are described by three covariates.The natural logarithm of explanatory variables was used [34, 56,57].The plot ratio, also called the floor area ratio (FAR), is the ratio of a building's total floor area (gross floor area) to the size of the piece of land on which it is built.FAR is logarithmically transformed into LnFAR.The green ratio is the ratio of green space to the entire plot area.The green ratio is logarithmically transformed into LnGreenRatio.The management fee of the property (in RMB/m 2 ) is logarithmically transformed into LnPropertyFees.The temporal variable is the age of the building at the time of its building year.We record the earliest construction year as the base number 1, and the count is increased each year thereafter.We calculate the distance from each residential plot to the nearest primary school, subway station, junior high school, and university (LnPriSchool, LnSubway, LnHighSchool, LnUniversity).We used administrative region to constrain the calculation range of geographically weighted regression.For sampling points in the same regions, geographically weighted regression calculations were normally used.Sampling points belonging to different regions were not considered in the scope of geographically weighted regression.
According to the algorithm process described in Section 2.2.3 of this paper, we used GWR and RGWR methods to establish characteristic price models.First, the AICc method was used to determine the optimal bandwidth of GWR; the fixed type was 8516.9 and the adaptive type was 155.Then, the characteristic price models of GWR and RGWR were established using the optimal bandwidth, while hypothesis tests were performed for spatial stationarity under different bandwidth strategies [7,9,23].The p-values of the hypothesis tests are all less than 1%, which is statistically significant.The spatial non-stationary characteristic test was carried out for the regression coefficient [23], as shown in Table 3, the results show that property fees, greening rate, FAR, subway station, primary school, and junior high schools all have spatial non-stationarity.Finally, the model regression coefficients, fitting values, and evaluation metrics indexes of GWR and RGWR were obtained under different bandwidth strategies.Taking GWR as the comparison method, the experiment analyzes the performance of the model, the fitting effect of the model and the influencing factors for house prices in Wuhan.

Model Performance Comparison
Table 4 displays a comparison of the RGWR model with GWR under the fixed bandwidth and adaptive bandwidth strategies.Under the fixed bandwidth strategy, the R 2 of the RGWR model is 0.7777, which is 14.77% higher than the GWR model, R 2 adj is 15.06%, MSE is 31.07%,and RMSE is 16.97%.The AICc value of RGWR model is −353.0750,which is 31.4006smaller than GWR.Generally, a difference in AICc by of greater than three indicates that the two models are also significantly different.The smaller the AICc value, the higher the model fitting accuracy.This shows that under the fixed bandwidth strategy, the RGWR model can better simulate the sale price of residential buildings in Wuhan.

Comparison of Model Fitting Effects
By comparing the predicted value and the real value of housing prices in the RGWR and GWR models, it is possible to intuitively explore the fitting effect of the model.Figure 4 shows the fitting effect distribution of RGWR and GWR under the fixed and adaptive bandwidth strategies.The X-axis represents the predicted values of different models under different bandwidth strategies, and the Y-axis represents the real housing prices.The red dotted line in the figure indicates that the real value is the same as the predicted value.Therefore, in the figure, the closer the predicted value point distribution and the true value point position are to the red dotted line, the better the fitting effect of the model.It can be clearly seen that under the same bandwidth strategy, the point distributions of RGWR are significantly higher than those of GWR near the red dotted line, indicating that the fitting effect of the RGWR model is significantly improved compared to that of GWR.Similarly, between the fixed bandwidth strategy and the adaptive bandwidth strategy, it can be seen that the point distribution of the RGWR fixed bandwidth is higher than that of the adaptive bandwidth near the red dotted line.At the same time, the R 2 value of the RGWR model under the fixed bandwidth strategy is 0.7777, 18.64% higher than that under an adaptive bandwidth.This indicates that, in the data environment of this article, the fitting effect of the RGWR model under the fixed bandwidth strategy is better than that of the RGWR model under the adaptive bandwidth strategy.

Analysis on the Price of Commercial Housing in Wuhan
Figure 5 show the price of commercial housing in Wuhan.Housing prices in the selected study area are between 4800¥ and 53,800¥, with an average price of 18,500¥.The housing estate with the lowest housing price is located in Changlejiayuan, Xinzhou District, Wuhan City, and the housing estate with the most expensive housing price is located in Tiandi Yujiang, Jiangan District, Wuhan City.It can be seen from the figure that the housing prices in the study area gradually moved from the urban fringe area to the urban core area and its surroundings, and the housing prices rose gradually, reaching a peak near the urban core area.This shows that, in the horizontal direction, housing prices show a gradual downward trend from the city center to the periphery.In areas with high housing prices (over 27,000 yuan/m 2 ), the distribution is concentrated in Wuchang District, Qiaokou District, Jianghan District, and Jiangan District, which shows that the high housing prices in the study area are greatly affected by the school district factors within the district.

Influence Factors
Summaries of the RGWR coefficients' estimation under the fixed bandwidth strategy, including the minimum (min), lower quartile (LQ), median (med), upper quartile (UQ), and maximum (max), are presented in Table 5.When using the RGWR models, property fees are positively correlated with house prices, as shown in Table 5.In other words, as the property fees or greening rate increase, the house price increases.In contrast, the presence of a subway station is negatively correlated with house prices; as the distance to the nearest school or subway station increases, the house price decreases.However, there is no significant correlation between the variable FAR and house prices, since the coefficients of FAR have both positive and negative values.Therefore, FAR are not major factors influencing house prices at the scale of the study area.However, this result occurred in this study because FAR has different significant impacts on house prices in different regionalization within the study area.The coefficients of school (e.g., primary school, junior school) and greening rate also have positive and negative values.Most of the schools are negatively correlated with house prices, while a few are positively correlated with house prices, while greening rate has the opposite trend, which is also related to the different influences of the same influencing factor on house prices in different regions in this study.Specifically, taking typical FAR, greening rate, junior high school, and primary school as examples, this paper analyzes the significant impact of different factors on different regions through the spatial distribution map of influencing factors.Figure 6 is the FAR coefficient diagram of the study area.It can be seen that the coefficients of FAR range from positive to negative correlations, indicating that there is no significant correlation between FAR and housing prices in the complete study area.However, specific to different administrative divisions, it can be seen that the FAR of Jianghan District, Jiangan District, Qiaokou District, and Wuchang District are positively correlated with housing prices, while the best middle schools and primary schools in Wuhan are basically concentrated in these four districts.Districts and these four districts are the core urban areas of Wuhan, and the plot ratio is positively correlated with the housing price; that is, the higher the plot ratio, the higher the housing price.In other areas of the study, the plot ratio is negatively correlated with house prices, that is, the higher the plot ratio, the lower the house price.This is also related to the pursuit of living comfort in non-central urban areas that deviate from the central area and remove school district factors.7 is the coefficient diagram of the greening rate in the study area.It can be seen that the greening rate is positively correlated as a whole.As the greening rate increases, the housing price rises.This is particularly prominent in the central area of the study area.The central area is densely built with less overall greenery, so the greening rate has a greater impact on housing prices.In the marginal area of the study area, the greening rate has an influencing factor −0.0884-0.0003,most of which have a weak negative correlation and approaching zero.This is because the overall residential construction is relatively high in the suburbs of the city.It is sparse and has a high degree of greening, so the impact of greening rate on housing prices is almost negligible.Figure 8 is the coefficient chart of junior and senior high schools in the study area.It can be seen that the distance of the sampling point between the middle and high schools has different impacts on housing prices in different administrative divisions in Wuhan.Among them, in Wuchang District, the overall housing price is high and there is little room for housing prices to rise.There are many key demonstration middle schools in the area and the area is small.The key demonstration middle schools cover a wide area, so the price is less affected by middle and high schools.The overall housing prices in Jianghan District, Qiaokou District, Hanyang District, and Dongxihu District belong to the middle and lower areas of Wuhan.The number of key demonstration middle schools in the region ranges from 2 to 4. Housing prices are greatly affected by middle and high schools, and the overall housing prices are negatively correlated.The overall housing prices in Jiang'an District and Qingshan District belong to the middle and high areas of Wuhan; housing prices in the area are negatively correlated with middle and high schools, and housing prices are most affected by middle and high schools.Other areas are located in fringe areas of Wuhan.There are few or no demonstration middle schools in the province, and the area is large and lacks high-quality middle schools, so the distance factor between middle and high schools has little effect on housing prices.Figure 9 is the coefficient chart of primary schools in the study area.It can be seen that the overall distance between the sampling point and the primary school in Wuhan is negatively correlated with the housing price.That is to say, the farther away from the nearest primary school, the lower the housing price.This is particularly prominent in the center of the study area, especially in Wuchang District, Jiang'an District, and Qiaokou District.As the best primary schools in Wuhan are concentrated in these three districts, the distance between primary schools in this administrative division has become an important factor affecting housing prices.As the edge of the study area is located in the suburbs of the city, educational resources are relatively balanced, and there is a lack of high-quality primary schools, so the distance between primary schools has little effect on housing prices.

The Performance of the Model in Exploring Spatiotemporal Heterogeneity
This paper takes the housing sale price in Wuhan as an example to carry out an experimental analysis, which proves the effectiveness of the introduction of region factors.The study found that, in the Wuhan housing sale price model, under the fixed and adaptive bandwidth strategies, the accuracy of the model is improved after considering the region factors, indicating that the administrative division has a significant impact on housing sale price in Wuhan.Drawing a price distribution map based on the estimation results of the RGWR and GWR models, Figure 5 shows the real housing prices, while Figure 10a,b, respectively, show the predicted housing prices of GWR and the predicted housing prices of RGWR.First, the spatial trend of housing prices distribution in the three figures is basically the same, and prices are expanding outward from the central city area, showing a gradual downward trend.This shows that the two regression models of RGWR and GWR are close to the real situation in global trend estimation, and can objectively reveal the law of housing prices changes in Wuhan.Secondly, in areas with high housing prices (above 24,000 yuan/m 2 ), RGWR predicted results are distributed in Jiangan District, Jianghan District, and Wuchang district.In addition to the above-mentioned areas, the GWR prediction results also include Qiaokou District, Hanyang District, and Hongshan District.When comparing the real housing prices in Figure 10, RGWR is closer to the real housing price distribution.Third, in the surrounding areas of administrative region, the predicted values of RGWR and GWR are quite different.For example, in the border area of Jiangxia District, Wuhan, the predicted result range of RGWR is below 14,000/m 2 , which is close to the true value, while the GWR values are between 15,000 and 19,000/m 2 , and the predicted result is relatively high.Similarly, the same situation exists in the border areas of Caidian

Impact on Adaptive and Fixed Bandwidth
Under the fixed bandwidth strategy, the RGWR model is improved by 14.77%, R 2 adj by 15.6%, MSE by 31.07%, and RMSE by 16.97%.Under the adaptive bandwidth strategy, the RGWR model is 3.8% better than the GWR model R 2 .R 2 adj increased by 3.88%, MSE increased by 6.48%, and RMSE increased by 3.31%.Through comparison, it can be found that the RGWR model under the fixed bandwidth strategy is much better than the RGWR model under the adaptive bandwidth strategy.This is because the selection of sampling points in the study area presents the characteristics of dense core urban areas and scattered peripheral areas.When the adaptive bandwidth strategy is adopted, because the sampling points are dense in the central area, the bandwidth is small, and the sampling points in the edge areas are sparse, and the bandwidth selection will become larger.To a certain extent, this is in line with the characteristics of small administrative divisions in the core urban areas of the study area and large administrative divisions in the fringe areas, thereby reducing the participation of cross-regional sampling points in the estimation.In the fixed bandwidth strategy, the same bandwidth is used in the research area, and a high number of cross-regional sampling points participate in the estimation.Therefore, the RGWR model improves more under the fixed bandwidth strategy than under the adaptive bandwidth strategy.

Spatial Distribution and Discrete Heterogeneity of Commercial Housing Prices in Wuhan
The overall housing sale prices in Wuhan city show a trend of gradual decrease from the city center to the periphery in the horizontal direction, with high house prices in the core urban areas and low house prices in the peripheral areas of the city, which is consistent with previous studies [58].Property fees are positively correlated with house prices, have a global spatial scale, and are the main factor affecting housing sale prices in the whole study area: the higher the property fees, the higher the house prices, indicating a heterogeneous spatial effect between the core and newly developed areas of the city [59].For different administrative regions in Wuhan, the high housing-price areas are concentrated in the center of Wuhan city, and they show a clear polycentric distribution in the core areas of Wuchang, Qiaokou, Jianghan, and Jiangan districts, which is basically consistent with previous studies [60], which show that the spatial distribution of housing prices in Wuhan city is a "three high-priced areas and one lower-priced areas " polycentric pattern with obvious spatial aggregation.At the same time, in the high-priced areas, the FAR and greening rate are positively correlated with housing price, which are the main factors affecting the sale price of housing in these areas.Residents in this area pursue living comfort more than those in the core urban area, so the FAR is negatively correlated with house prices, i.e., the higher the FAR, the lower the house prices.This observation of spatial heterogeneity is consistent with a recent study that suggested that residents of downtown and suburban areas may value different spatial characteristics [61].With regard to schools as an influencing factor, the general view of previous studies is that all types of schools increase house prices in surrounding residential neighborhoods [62], while some scholars argue that only elementary and junior high schools are globally correlated with house prices [63], and there is also a recent study that used MGWR to analyze house prices in Wuhan and showed that only junior high schools and kindergartens are positively correlated, and elementary schools are negatively correlated [64].The results of this paper differ from previous studies in that elementary schools and junior high schools are negatively correlated with house prices in high-house-price areas and are one of the main factors affecting the sale price of housing in the area: the closer to the school, the higher the house price.However, they have little effect on house prices in low-price areas, which is determined by China's school district policy.Since there are few high-quality primary and secondary schools in low-price administrative regions, house prices in the area are less affected by schools.In this regard, RGWR can better explain spatial discrete heterogeneity and better reflect the impact of China's primary and secondary school zoning policy, where high-quality schools drive a sharp increase in surrounding house prices.

Conclusions
In the last two decades, GWR technology has continuously developed and evolved, which has better solved the continuous heterogeneity in spatial relations caused by irregular distribution in space, but cannot solve the discrete heterogeneity.This paper proposes an RGWR model to use the influence of zoning, modify the spatial kernel function, and optimize the spatial weight.The proposed model can extend the traditional GWR model and weaken the influence of "heterogeneous" observation points on regression points to detect spatial non-stationary features and spatial discrete heterogeneity at the same time.
We use the experimental results of the Wuhan housing price case study to show that the modeling accuracy of RGWR is better than that of the GWR model.The latter only deals with spatial non-stationarity, while the former solves the problem of the discrete heterogeneity of housing prices in Wuhan and improves the accuracy of the model.Compared with the GWR model, RGWR can increase R 2 and R 2 adj from 0.6776 and 0.6732 to 0.777 and 0.7746, respectively, and reduce MSE and RMSE from 0.0338 and 0.1838 to 0.0233 and 0.1526, respectively.The AICc standard is also 37.4006 lower than GWR.Statistical tests show that there is a significant difference between RGWR and GWR, so we conclude that it is meaningful to incorporate zoning factors into the GWR model.
Although RGWR originated from the study of house prices, the analysis in this paper is suitable for investigating various phenomena across spatial partitions, such as landscape dynamics, crime, and air quality [65][66][67].There are also some limitations in our case study.For example, it is unclear whether RGWR will perform better when applied to data covering more observations or whether the model still has better accuracy than GWR when faced with different partition methods and partition scales.The "zoning" proposed in this paper is not just administrative zoning.Whether RGWR model still has better accuracy than GWR if we choose different zoning methods and zoning scales and with different research problems needs further study.
Therefore, future research will focus on applying RGWR to more observation data and different zoning scales.At the same time, further exploration will be carried out for zoning factors in order to improve the accuracy of detecting discrete heterogeneity.Finally, as the amount of data increases, the interpretation of partition factors between different regions will become more accurate.More research needs to be conducted on the computational efficiency bottleneck of RGWR, which is also one of the hot future research directions for GWR [17].These must be studied in future work.

Figure 3 .
Figure 3.The distribution of sampling points in Wuhan.

Figure 4 .
Figure 4. Fitting effect distribution of GWR and RGWR.(a) Predicted value and the real value of the housing prices in the GWR models under the fixed bandwidth strategies; (b) Predicted value and the real value of the housing prices in the RGWR models under the fixed bandwidth strategies; (c) Predicted value and the real value of the housing prices in the GWR models under the adaptive bandwidth strategies; (d) Predicted value and the real value of the housing prices in the RGWR models under the adaptive bandwidth strategies.

Figure 5 .
Figure 5. Price of commercial housing in Wuhan.

Figure 6 .
Figure 6.FAR coefficients map of the study area.

Figure
Figure7is the coefficient diagram of the greening rate in the study area.It can be seen that the greening rate is positively correlated as a whole.As the greening rate increases, the housing price rises.This is particularly prominent in the central area of the study area.The central area is densely built with less overall greenery, so the greening rate has a greater impact on housing prices.In the marginal area of the study area, the greening rate has an influencing factor −0.0884-0.0003,most of which have a weak negative correlation and approaching zero.This is because the overall residential construction is relatively high in the suburbs of the city.It is sparse and has a high degree of greening, so the impact of greening rate on housing prices is almost negligible.

Figure 7 .
Figure 7. Greening rate coefficients map of the study area.

Figure 8 .
Figure 8. Junior high school coefficients map of the study area.

Figure 9 .
Figure 9.Primary school coefficients map of the study area.
District and Dongxihu District, because Jiangan District, Jianghan District, Qiaokou District, and Wuchang District are the core areas of Wuhan, and housing prices are relatively high, Hanyang District is adjacent to Qiaokou District and Wuchang District; GWR did not use regions to screen sample points when estimating housing prices in Hanyang District.The selection of sample points in Wuchang District and Qiaokou District for use in the estimation led to the overall high results.RGWR uses regions to screen sample points, and the predicted results are generally close to the real housing prices.The RGWR model is meaningful.

Figure 10 .
Figure 10.The predicted housing prices of different models.(a) The predicted housing prices of the GWR model; (b) The predicted housing prices of the RGWR model.

Table 1 .
Demonstrative list of schools in various regions of Wuhan.

Table 2 .
Variables used to predict housing prices in Wuhan, Hubei, China.

Table 3 .
The spatial non-stationary characteristic test of the RGWR and GWR model.

Table 4 .
The value of the RGWR and GWR model.