Mass Appraisal Modeling of Real Estate in Urban Centers by Geographically and Temporally Weighted Regression: A Case Study of Beijing’s Core Area

: The traditional linear regression model of mass appraisal is increasingly unable to satisfy the standard of mass appraisal with large data volumes, complex housing characteristics and high accuracy requirements. Therefore, it is essential to utilize the inherent spatial-temporal characteristics of properties to build a more e ﬀ ective and accurate model. In this research, we take Beijing’s core area, a typical urban center, as the study area of modeling for the ﬁrst time. Thousands of real transaction data sets with a time span of 2014, 2016 and 2018 are conducted at the community level (community annual average price). Three di ﬀ erent models, including multiple regression analysis (MRA) with ordinary least squares (OLS), geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), are adopted for comparative analysis. The result indicates that the GTWR model, with an adjusted R 2 of 0.8192, performs better in the mass appraisal modeling of real estate. The comparison of di ﬀ erent models provides a useful benchmark for policy makers regarding the mass appraisal process of urban centers. The ﬁnding also highlights the spatial characteristics of price-related parameters in high-density residential areas, providing an e ﬃ cient evaluation approach for planning, land management, taxation, insurance, ﬁnance and other related ﬁelds.


Introduction
The urban center is the core area of an urban structure. It is usually the area with the most concentrated functions of urban politics, economy and culture. The high density of urban centers and the central position of urban functions make it different from the cities' other areas in many aspects, i.e., high population density, traffic congestion and high land development intensity. As a result, the central area of a city often does not have new residential land for development to build first-hand real estate for housing market. Therefore, the real estate market in central areas mainly consists of second-hand housing transactions. For a city's real estate market, the government can determine and revise policies regarding planning, land, finance, tax, price and other aspects. Tax policy is a very important part of the policies. China is one of a small number of countries that does not levy annual real estate tax on ownership of residential properties. Recently, in the Government Work Report on March 5, 2019, the idea to "steadily promote the legislation of real estate tax" has been clearly put forward. Referring to the experience of developed countries, real estate tax is often based on the value of the houses [1].
Land 2020, 9,143 3 of 18 situations for mass appraisal modeling of the Beijing core area. The final part presents conclusions and recommendations for future research.

Study Area
Beijing is the capital of China. It is the political, cultural, international communication and technological innovation center of the nation. According to the Beijing Municipal Bureau of statistics (Release date: May 31, 2019), there are 21.54 million permanent residents in Beijing in 2018, and it is planned to stabilize at 23 million after 2020. The study area is the core area of Beijing, also named the Capital Functional Core Area. It is composed of two administrative regions, Xicheng District and Dongcheng District. The total area is 92.54 square kilometers, including 50.7 square kilometers in Xicheng District and 41.84 square kilometers in Dongcheng District. In 2018, the permanent residents of Xicheng District and Dongcheng District are 1.18 million and 0.82 million, and the corresponding population density (unit: person/km 2 ) are 23333 and 19637 respectively. Figure 1 shows the map of the study area.
Land 2020, 9, x FOR PEER REVIEW 3 of 21 analyzes the different situations for mass appraisal modeling of the Beijing core area. The final part presents conclusions and recommendations for future research.

Study Area
Beijing is the capital of China. It is the political, cultural, international communication and technological innovation center of the nation. According to the Beijing Municipal Bureau of statistics (Release date: May 31, 2019), there are 21.54 million permanent residents in Beijing in 2018, and it is planned to stabilize at 23 million after 2020. The study area is the core area of Beijing, also named the Capital Functional Core Area. It is composed of two administrative regions, Xicheng District and Dongcheng District. The total area is 92.54 square kilometers, including 50.7 square kilometers in Xicheng District and 41.84 square kilometers in Dongcheng District. In 2018, the permanent residents of Xicheng District and Dongcheng District are 1.18 million and 0.82 million, and the corresponding population density (unit: person/km 2 ) are 23333 and 19637 respectively. Figure 1 shows the map of the study area.

Data Description
The second-hand housing transaction database is from Lianjia, the largest second-hand house trading agency in China, with a local market share of nearly 60% in Beijing. The valuable data come from the records of transaction process in real commercial environment. The database contains average price data of annual transactions in each community and annual average value of all housing attributes in corresponding community. The transaction time span is 2014, 2016 and 2018, respectively. By removing the samples with missing attributes and obviously deviated coordinates, Table 1 shows the number of annual effective transaction samples of communities in Dongcheng District and Xicheng District. The total number of samples is 3064.

Data Description
The second-hand housing transaction database is from Lianjia, the largest second-hand house trading agency in China, with a local market share of nearly 60% in Beijing. The valuable data come from the records of transaction process in real commercial environment. The database contains average price data of annual transactions in each community and annual average value of all housing attributes in corresponding community. The transaction time span is 2014, 2016 and 2018, respectively. By removing the samples with missing attributes and obviously deviated coordinates, Table 1 shows the number of annual effective transaction samples of communities in Dongcheng District and Xicheng District. The total number of samples is 3064.  Figure 2 shows the spatial distribution and kernel density distribution of community annual average price (Unit: Renminbi (RMB) Yuan/m 2 ). At first, it shows the geographical distribution of communities in core area. Some communities have all the three years' transactions; some others only have one or two years'. Then, the interpolation of community annual average price is utilized to create a price surface by using the inverse distance weighted (IDW) method [43], for 2014, 2016 and 2018, respectively. IDW is a convenient spatial interpolation method, which can intuitively display the spatial distribution of the communities' annual average price. It takes the distance between the interpolation point and the sample point as the weight for weighted average. The closer the interpolation point is, the greater the weight is given by the sample point. In this paper, the IDW method is supported by ArcGIS Desktop Software (Version: 10.5; Type: Advanced). And the mathematical power parameter of distance is set to the default value of 2. The distance parameter (search radius type) is defined as an adaptive radius with the default value of 12, which specifies 12 nearest input sample points to be utilized to perform interpolation. Finally, the kernel density of community samples in each year is estimated. The kernel density estimation is a natural extension of the histogram which shows the overall trend and density distribution regularity of the variables [44]. Based on MATLAB Software (Version: R2019b), a normal kernel density function is utilized with log(Price) (see Table 2 for definition) on the x-axis, probability density estimate values on the y-axis and default optimal bandwidth.
Residential community refers to a residential area surrounded by urban roads or natural boundaries, with a certain scale of living population, and built with public service facilities to meet the needs of residents. The transaction database of annual average price comes from residential community distributed throughout Beijing's core area and hence is representative of the core area's housing market. For mass appraisal of real estate in a city, the community scale is a proper choice. The evaluation value of the community will be the standard baseline of the individual properties within it. Based on the sufficient quantity and coverage of the 3064 community samples, the regression models can be simulated and applied well.
According to the attribute of each community sample in the database, there are 25 variables in total. The community average price is the only independent variable. Based on research purpose, all the dependent variables are divided into four categories: property structure in community, basic condition of community, traffic condition around community and living condition around community. Property structure in community contains community id, buying year, average area, average bedrooms, average decoration condition, average orientation condition. Basic condition of community includes the average house age, average ladder-to-household ratio, average years of property right, average ratio of elevator, average property management fee, number of buildings, number of households, floor area ratio (FAR) and green ratio (GA). Traffic condition around community consists of the shortest distance to bus station and the shortest distance to subway station. Living condition around community contains the shortest distance to kindergarten, the shortest distance to the park, the shortest distance to the hospital, the shortest distance to the shopping mall, the shortest distance to the food market, the shortest distance to the supermarket, the shortest distance to the movie theater, and the shortest distance to the restaurant. All the shortest distance calculations are within the 2 kilometer Euclidean-distance buffer zone. A summary of the detailed information and statistical analysis of the variables is listed in Table 2. It shows a whole picture of the community condition in Beijing's core area. For property structure in community, the average housing price of the community is 79122 RMB Yuan/m 2 with an average living area of 71.5 m 2 . Furthermore, the average number of two bedrooms for each property is suitable for a family of three. The average property decoration condition in the community is 0.33 (from "best = 1" to "worst = 0"), which is a normal condition for second-hand properties.
Land 2020, 9, x FOR PEER REVIEW 5 of 21 average number of two bedrooms for each property is suitable for a family of three. The average property decoration condition in the community is 0.33 (from "best = 1" to "worst = 0"), which is a normal condition for second-hand properties.
The orientation condition of the core area's community has an average level of 0.70. For each property, the orientation weight ranges from 0.00 to 1.00. According to Beijing's condition, houses facing south get the most light and good ventilation. Therefore, a weight of 1.00 represents the South with the best orientation. The following orientations are East, West and North with the weights of 0.66, 0.33 and 0, respectively. The data of the orientation in this paper are the average value of all properties in the relevant community, which indicates a comprehensive condition of the orientation. The closer the average value is to 1, the better the comprehensive orientation of the community is. For the basic condition of community, the average age of the building is 25 years. This shows that the development time of real estate in the core area is earlier and most communities are built in the 1990s. It also has an average of eight buildings and 597 households. Meanwhile, 45% of the buildings in community have elevators. For the traffic condition around the community, the  The orientation condition of the core area's community has an average level of 0.70. For each property, the orientation weight ranges from 0.00 to 1.00. According to Beijing's condition, houses facing south get the most light and good ventilation. Therefore, a weight of 1.00 represents the South with the best orientation. The following orientations are East, West and North with the weights of 0.66, 0.33 and 0, respectively. The data of the orientation in this paper are the average value of all properties in the relevant community, which indicates a comprehensive condition of the orientation. The closer the average value is to 1, the better the comprehensive orientation of the community is. For the basic condition of community, the average age of the building is 25 years. This shows that the development time of real estate in the core area is earlier and most communities are built in the 1990s. It also has an average of eight buildings and 597 households. Meanwhile, 45% of the buildings in community have elevators. For the traffic condition around the community, the bus condition with the average shortest distance of 0.76 kilometers is better than the subway condition with the average shortest distance of 0.90 kilometers. For the living condition, the shortest distance to kindergarten is 0.56 kilometers and the communities in the study area have great accessibility to the leisure and living places. The average shortest distance to parks, hospitals, shopping malls, food markets, supermarkets and restaurants are 0.76, 0.60, 1.03, 0.6, 0.55, 0.68 kilometers, respectively. People could walk to these places within 15 minutes (walk speed: 1.0-1.2 m/s) or ride a bike within 5 minutes (bike speed: 3.0-5.0 m/s).

Multiple Regression Analysis
Multiple regression analysis (MRA) explains the regression of a dependent variable over more than one independent variable. This makes it suitable for property price analysis because property values are determined by more than one property attribute. Equation (1) shows the formal model of an MRA.
where Y is the community price, X 1 , . . . ,X k are the community attributes, β 0 is the constant, β 1 , . . . ,β k is the coefficients, ε is the error term.
In order to facilitate the calculation and reduce the scale of housing prices, the logarithm calculation is carried out for the annual average house price of the community. In this paper, we adopt the natural logarithm calculation for community price. Then the equation is converted to: Generally, the MRA model is usually operated with the ordinary least squares (OLS). OLS is a data-driven methodology which can make the selected regression model has the minimum residual sum of squares of all the observations [45].

GWR Model and GTWR model
The GWR model is also a linear regression model which pays more attention to the local regression based on spatial relationship. The model can be performed as Equation (3).
where (u i , v i ) represents the (x, y) coordinates of community i, β 0 (u i , ν i ) is the constant value or intercept value. β k (u i , ν i ) are the coefficients of variable X ik in community i. ε i is the error term.  [16], the GTWR model can be performed as Equation (4).
where (u i , ν i , t i ) represents the (x, y, t) spatial-temporal coordinates of community i. Other factors in the equation are the same as Equation (3). Then the linear regression should be solved by estimating the where W(u i , ν i , t i ) is the spatial-temporal weight matrix to community i. By defining the spatial distance d S and temporal distance d T , the spatial-temporal distance d ST can be combined as in Equation (6) where ⊗ could be any operator for certain situation. Here the + operator is adopted and scale factors of λ and µ are selected for d S and d T .Then the spatial-temporal distance of community i and community j with transaction year t i and t j can be represented in Equation (7) d ST Based on the First Law of Geography [46], the closer an observation is to community i, the greater the weight. The transaction year is also assumed. A different transaction year has mutual influence, i.e., the closer the transaction year, the greater the weight. This kind of weight is commonly built by Gaussian distance decay-based functions as shown in Equation (8) [47].
where h ST is the parameter of spatial-temporal bandwidth and λ/µ is the spatial-temporal distance ratio. The λ/µ value could be optimized by using the cross-validation (CV) or corrected Akaike information criterion (AICc) [48].

Multiple Regression Analysis with Ordinary Least Squares
The multiple regression analysis with ordinary least squares is carried out with all the variables in the Beijing core area. Table 3 shows the parameter estimates, their standard error, and inference results. There are six independent variables (p-value > 0.05) which are not statistically significant with the community price, including the property management fee, the green ratio, the shortest distance to the kindergarten, the shortest distance to the park, the shortest distance to the food market and the shortest distance to the supermarket. The coefficients of different variables reflect the degree and direction of the influence on the dependent variable under different measurement unit. The transaction year is the most important one with the coefficient value of 0.162. Stderror is the standard deviation of regression coefficient; the smaller it is, the more accurate the model is. T-statistic and p-value are both used to test the significance of the model variables. The larger the t-statistic is, the more significant the corresponding covariate is. The variance inflation factors (VIF) of all independent variables are also tested and all VIF values are smaller than 7.5 (most of them are smaller than 2), indicating that there is no global significant multicollinearity (also called redundant variable) among the explanatory variables. In terms of the performance of the overall model, the R 2 is 0.5680 and the adjusted R 2 is 0.5647, indicating that the OLS model can explain 56 percent of the variation in community price in core area.

Geographically Weighted Regression Model
OLS results in Table 3 shows that 17 significant variables are selected from the total of 23 variables. In order to run the GWR model, the global and local multicollinearity should also be removed. Otherwise, the result will not be feasible. The global multicollinearity could be checked by the VIF values. Variables with large VIF values (above 7.5) are redundant variables. It is more difficult to find out the local multicollinearity. One of the effective ways is to create a thematic map for each of the independent variables and look for areas with little or no variation in values. We combine the OLS results with the thematic map of each variable and finally find out the variables with local multicollinearity are transaction year, ladder-to-household ratio and years of property right. Finally, 14 variables are involved in building the GWR model. The result is shown in Table 4.
The model is implemented by the ArcGIS Desktop Software (Version: 10.5; Type: Advanced). The Gaussian kernel is used for GWR model and the kernel type is fixed. The overall R 2 is 0.2215. The adjusted R 2 is 0.2007, and the bandwidth is 4098.1515 meters. The bandwidth is an important factor for the GWR model. It determines the smoothness of the model. The optimal result of bandwidth is estimated by the AICc methods. The residual square is 305.0345. The smaller the residual square is, the more the GWR model fits the observed data. The sigma value is the square root of the normalized residual sum of squares, which is used for AICc calculation. Detailed information is listed in Table 5.  GWR is a local linear regression model and the result reflects that the GWR model can only explain around 20 percent of the variation in the center area of Beijing for the whole dataset of 2014, 2016 and 2018. It reflects that the GWR model is not effective for the multi-year dataset of core area in Beijing. Figure 3 shows the distribution of R 2 and standard residual value of the GWR model.
As for the problem of the GWR model in the same community, the different attribute values (independent variables) of 2014, 2016 and 2018 are treated as three different samples, all in the same location. This situation may lead to the fact that different sample data at the same location are calculated and averaged during local regression, which disturbs the spatial characteristics of local regression. Therefore, the result of R 2 is very low. For further verification, according to the methodology in this paper, the database is intercepted by year. The data of 2014, 2016 and 2018 are extracted respectively. First, the OLS test is carried out, then global and local multicollinearity tests are taken into progress for the significant variables. Afterward, all final variables are utilized to build the GWR model and the results are shown in Table 6 0.221452 Adjusted R 2 0.200689 GWR is a local linear regression model and the result reflects that the GWR model can only explain around 20 percent of the variation in the center area of Beijing for the whole dataset of 2014, 2016 and 2018. It reflects that the GWR model is not effective for the multi-year dataset of core area in Beijing. Figure 3 shows the distribution of R 2 and standard residual value of the GWR model. As for the problem of the GWR model in the same community, the different attribute values (independent variables) of 2014, 2016 and 2018 are treated as three different samples, all in the same location. This situation may lead to the fact that different sample data at the same location are calculated and averaged during local regression, which disturbs the spatial characteristics of local regression. Therefore, the result of R 2 is very low. For further verification, according to the methodology in this paper, the database is intercepted by year. The data of 2014, 2016 and 2018 are extracted respectively. First, the OLS test is carried out, then global and local multicollinearity tests are taken into progress for the significant variables. Afterward, all final variables are utilized to build the GWR model and the results are shown in Table 6. Obviously, the results of the GWR model for each year separately are much better than the GWR model with all years' database. The adjusted R 2 of GWR model for 2014, 2016 and 2018 is approximately 0.5374, 0.4618 and 0.6321.

Geographically and Temperally Weighted Regression Model
The independent variables involved in GTWR model are the same as GWR model. The GTWR model is also provided in ArcGIS Desktop with a plug-in program (Release Version: https://www. researchgate.net/publication/339567248_GTWRv1_1_20_May2020zip. Algorithm Source: reference [16]. Huang, B.; Wu, B.; Barry, M. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. International Journal of Geographical Information Science 2010, 24, 383-401, doi:10.1080/13658810802672469). The Gaussian kernel is used for GTWR model and the kernel type is fixed. The transaction year variable is set as the timestamps according to the program's instrument. After the calculation, the R 2 is 0.8200 and the adjusted R 2 is 0.8192. The bandwidth of the GTWR is 0.1122 and the spatial-temporal distance ratio is 0.3731. The detail information of model diagnosis is shown in Table 7. Figure 4 shows the spatial distribution of the standard residuals for the GTWR model in 2014, 2016, and 2018. Where more than 2.5 times of standardized residuals need to be examined. According to the output information and distribution maps, the residuals range from −0.8759 to 1.0417 in 2014, −1.2920 to 0.6153 in 2016 and −0.6944 to 0.3574 in 2018. There is no residual value that is statistically significant clustering of high and/or low residual. This indicates that the GTWR model is reliable.  According to Figure 5, cold colors (e.g., black, dark blue) indicate that the area variable has a negative effect on the annual average house price of community. The larger the area is, the lower the average house price is. Warm colors (e.g., red, orange) indicate positive effect. The larger the area is, the higher the average house price is. From 2014 to 2018, the residential area in Xicheng District on the left side of the research area gradually changed from positive to negative. By 2018, only a proportion of the southern residential area has a positive correlation with the housing price. One reason for this change is that within five years, the purchasing ability of a family will not change too much. With the rise of housing prices, if most families still want to make a deal, they can only choose a smaller area of housing. Yet small area housing often has a higher unit price. This leads to the distribution that the smaller the area, the higher the unit price by 2018. Figure 6 is the coefficient analysis of decoration. Considering the condition of 2014, 2016 and 2018, most communities have the pattern that the better decoration, the higher the housing price. This feature is prevalent in second-hand housing transactions. However, there are still obvious differences in the degree of influence. For instance, the coefficient value of the warm colors (e.g., red, orange) can contribute 20% to 30% of the price. In particular, the southern part of Dongcheng District has the lowest sensitivity to decoration in 2014, but the highest sensitivity in 2018. Generally, keeping other factors the same, the house with exquisite decoration will get a higher valuation than the house with ordinary decoration. However, when the transaction price of the house greatly exceeds the cost of decoration, the buyer will become insensitive to the decoration situation. Figure 7 shows the distribution of coefficients for the shortest distance from the community to the bus stop. Because the variable here is the minimum distance, the situation is the opposite of the variable for area and decoration. Overall, the central area keeps cold colors (e.g., black, dark blue), which means that the closer to the bus station, the higher the house price. On the contrary, the surrounding areas have the opposite trend. Considering the high density of traffic facilities in the study area, it may be a phenomenon of supersaturation, as the convenience of public transport also means traffic congestion and traffic noise. Meanwhile, the warm colors (e.g., red, orange) of the communities are basically around the second and third ring, which is the urban expressway of Beijing. Figure 8 is the coefficient distribution of the shortest distance to the hospital. Cold colors (e.g., black, dark blue) indicate that the smaller the shortest distance to the hospital, the higher the house price is. This also represents that these areas are still at a stage of positive demand for medical resources. The overall distribution trend for the three years of 2014, 2016 and 2018 has not changed. On the one hand, hospital construction needs a large investment in public infrastructure construction, which takes many years from input to output. On the other hand, it also reflects that the level of medical resources in the core area remains relatively stable. Figure 9 shows the coefficient distribution of the shortest distance to the restaurant. Most communities of Xicheng District are in warm colors (e.g., red, orange), indicating that the closer the distance is, the smaller the impact on house prices. It presents a state of oversaturation. Most of the communities in Dongcheng District are in cold colors (e.g., black, dark blue), indicating that the more convenient the distance from the restaurant, the higher the house price.

Conclusions
Mass appraisal is considered when many properties need to be assessed under an evaluation standard on a given date. Compared with the appraiser's house by house evaluation, the software programs of mass appraisal models can provide a more effective, fair and accurate result, together with easier operation and lower cost in the practical application. In this research, the level of community is used for the mass appraisal modeling with annual average price and other meaningful attributes. The database contains the price data of 2014, 2016 and 2018 in 3064 communities. Three mass appraisal models including the MRA with OLS, the GWR model and the GTWR model are built in the urban center of Beijing core area as the study area. The overall performance of the models is shown in Table 8. From the results of mass appraisal, MRA with OLS, as a global linear regression model, has a general effect and can explain about 56% of the information. In contrast, as a local linear model, the adjusted R 2 of GWR is only 0.2007, which is invalid in this experimental area. However, when time factor is introduced to form the GTWR model, it will be able to take advantage of the local model and obtain the adjusted R 2 of 0.8192. Housing price data are sensitive to spatial factor. At the same time, the influence of temporal factor on housing prices is also obvious from the results of Table 3 in Section 3.1. Therefore, modeling the sample data with spatial-temporal heterogeneity will be able to more accurately simulate the characteristics of housing price in the research area. Finally, GTWR model can make good use of multi-year community-level data to conduct the mass appraisal modeling. There are also some limitations in this study that need to be further discussed. First is about the evaluation scale. This study has proved that community scale is feasible and effective. However, the community data is a mathematical processing of the original individual transaction data, which may lose some important information. At the same time, it should be noted that if the transaction dataset for each property is used to execute the mass appraisal for the whole city, the amount of data will be greatly increased. For local linear regression model, the multiple increases of the amount of calculation are a challenge to the stability and efficiency of the model. Finally, compared with the housing price data, the rental data have a higher transaction frequency and is relatively stable in a housing submarket. It can better describe the housing value from the perspective of residence and usage rather than investment. A useful future study will be to introduce the rental data into the construction of the mass appraisal model and make a comparative analysis with the housing price data.

Conflicts of Interest:
The authors declare no conflict of interest.