A Study on Estimation of Land Value Using Spatial Statistics: Focusing on Real Transaction Land Prices in Korea

The aim of this research is to compare OLS (Ordinary Least Squares) and spatial regression models which are methods of calculating the traditional value of land—using data on the practical transaction price of land—and to enhance the applicability of estimation of official land assessment prices set by the Korean government while deducing policy implications for effective implementation. That is, as a way to overcome the limitations of the traditional regression model, we compare various Generalized Regression Models such as SLM (Spatial Lag Model), SEM (Spatial Error Model) with OLS. Consequently, an in-depth diagnosis is conducted to generate a proper estimation model for land pricing, and, also, the analysis focuses on vertical and horizontal equity using COD (Coefficient of Dispersion), COV (Coefficient of Variation) and PRD (Price-Related Differential). The results indicate that SEM is more appropriate than AIC (Akaike info criterion) and SC (Schwarz criterion) in terms of measuring log-likelihood, demonstrating that the spatial autocorrelation model is superior to the traditional regression model. It shows that the SEM is also the best among the tested models with regard to measuring horizontal equity. The spatial econometric model, therefore, is strongly recommended for estimating the prices of land and houses.


Introduction
In general, real estate prices are determined by various factors such as diverse environmental conditions, features of neighboring lots and the attributes of surrounding areas due to their fixedness and locality.The hedonic model of estimating the prices of real estate has been regarded as the best among various approaches for its accuracy [1].The research of Rosen (1974) using the hedonic model has had much impact on approaches to measuring the value of land, apartments, and offices [2][3][4].
The hedonic model, however, has limits in estimating parameters with OLS, generating autocorrelation errors [5].Specifically, estimating real estate prices without considering the spatial effect of spatial data causes spatial autocorrelation errors, thereby increasing the standard error of estimated parameters [6][7][8].Furthermore, it may also lead to a biased result in the statistical verification [9].Hereupon, Dubin (1988Dubin ( , 1992)), assuming that spatial autocorrelation was caused by various characteristics including the accessibility of neighboring areas, argued that estimating real estate price without considering the location related properties, which were important in the hedonic price model, would cause the residuals to be spatially auto-correlated [10,11].
That is, the estimate of a parameter and the estimate of standard error are biased.This will affect statistical deduction, thereby causing a distortion in estimated prices [12,13].In particular, the aforementioned problem may lead to a more critical problem such as inequity of taxation and distrust of government administration when estimating public land value which is the reference value for taxation and a range of public information [14].For example, in 1989, the Korean government introduced the publically announced land price system to provide land information to real estate market participants and to establish the criteria for various taxations and charges [15], and it has announced the prices every year.The model, however, has relied on a land price comparison table, which is based on the hedonic price model, to calculate publically announced land prices [16].The model could not adjust to spatial autocorrelations consequently.
Recently, in Korea, the spatial econometric model, which can reduce spatial autocorrelation problems, has started to gain attention.Particularly, the Korean government has been searching for a method to calculate publically announced prices similar to market prices of real estate with which sellers and buyers trade land (hereafter this will be called real transaction prices).As a specific method of estimating land prices, various researches have turned to the spatial econometric model.However, most of the previous research has focused on publically announced house prices, and there have been rare studies on spatial statistical models to estimate publically announced land prices using real transaction prices [17][18][19][20][21][22][23].
Accordingly, this study employs a spatial statistical model to estimate land prices using real transaction prices.It compares the traditional land value calculation model and OLS comparative method to determine applicability and improve the publically announced land price estimation model.To achieve the research aim, it uses the Spatial Error Model (SEM) and Spatial Lag Model (SLM) to more effectively grasp spatial dependence represented by spatial autocorrelation.The spatial area of this study is the city of Seoul, the capital of the Republic of Korea.

Study Area and Data
The study area is Seoul shown in Figure 1, the capital city of the Republic of Korea in Northeast Asia.Seoul is a cosmopolitan city and also a hub for politics, economy and culture.Therefore, it is the most frequently used region when it comes to doing research on the policies and systems of the Korean government.The administrative district of Seoul consists of the 25 boroughs and 467 administrative towns.It has a population of about 10,369,000 [24].Currently, the zoning of Seoul is classified according to residential areas, commercial areas, industrial areas, green zones and the green belt area.For effective research, this study focuses on residential and commercial areas as its subject of analysis.Though this area is comparatively small-605.21km 2 , equivalent to 0.6% of the whole area of ROK (South Korea)-it has the population of about 10.4 million, amounting to 1/5 of the total population [16].Thus, the population density of the city is very high.As shown in Table 1, most areas are residential or commercial areas.It implies that land use plans of the Seoul city government are performed centering on those two types of areas.Also, it is very likely that most land transactions are made in those two areas.In this regard, this study deals only with residential and commercial areas as its subject of analysis.With regard to location data, this study utilized the continuous land registration map found in the Korea Land Information System (KLIS) of the Ministry of Land, Infrastructure and Transportation.As for the major variables of the analysis, we selected 18 land attribute factors that were utilized as the feature factors in the current version of the officially assessed land price calculation model [16].However, including all variables as independent variables in the model would create the problem of multicollinearity, thereby making it difficult to establish a regression model.In addition, it would also increase dispersion even though it might minimize the degree of bias.As a result, it would be difficult to make an accurate estimation because it could not be guaranteed that mean square error is minimized [16].
Statistical methods to deal with these problems consist of variable selection methods such as the forward selection method, backward elimination method, and stepwise selection method [25].
Among these three methods, the forward selection method has a limit in which, even if a previously selected explanatory variable is deemed unimportant by an explanatory variable selected later, the variable cannot be eradicated.The backward elimination method has its own limitation in which the explanatory variable eradicated previously cannot be considered again in the estimation model.To compensate for the defects of these two methods, the stepwise selection method was suggested and has been widely used.The method allows us to examine variables to add or eradicate in each step.Accordingly, this study also used the stepwise selection method to select variables used in the spatial statistical model in estimating real transaction prices.As a result, this study selected seven important variables for residential areas, and six ones for commercial areas as shown in Table 2.

Spatial Autocorrelation
According to the first law of geography of Tobler (1970), "everything is related everything else, but near things are more related than distant things" [26].That is to say, those things within a space are not randomly distributed within that space.Rather, they are influencing each other.Moreover, the closer they are geographically, the more similar values they may have (regardless of whether they have a positive spatial autocorrelation or a negative spatial autocorrelation).
The methods to measure spatial association consist of two kinds: global statistics and local statistics.The former is an indicator showing general tendencies of similar values, and the latter provides detailed statistics indicating spatial group patterns of similar values focusing on a specific area.This study tests the statistics using Moran's I statistics invented by Moran (1950) as global measurement [27] and LISA (Locally Indicator of Spatial Association) developed by Anselin (1988) to measure spatial association within a local dimension [28].Moran's I is calculated as shown in Formula (1).For instance, it can be defined as the correlation coefficient between variance W ij of ith raw and jth column in m ˆn matrix and variance adjacent to the surrounding area of a corresponding site.
Local Moran's I method is shown in Formula (2).Here, the formula means the attribute of X, and X means average of X, and means spatial weight of and X.

Spatial Regression Model
The spatial regression analysis model adds a spatial weighting matrix to a general linear model.The weighting matrix is included as an explanatory variable.Thus, it can be regarded as a form of Formula (3), which is expanded from the general linear regression model.Such a spatial regression model can be classified according to the spatial lag model and spatial error model [29].
The spatial lag model takes the form of Formula (4) as a semi-form because it adds an explanatory variable matrix to the basic hedonic price function.That is to say, it actively captures and interprets autocorrelation based on the concept of leveraging spatial autocorrelation as another explanatory variable [30].
p1 ´pWq ´1 of Formula (5) refers to spatial multiplier.This spatial multiplier refers to an indirect effect or an overall external effect as to a spatial interaction.Also, it means that all points are related to each other in a single system [31].Thus, the regression coefficient in a spatial lag model is β ˆp1 ´pWq ´1 rather than β.The Parcel of corresponding area is affected by not only its regional attributes but also the attributes of other regions through the spatial weighting matrix Y " pI ´pWq ´1Xβ `pI ´pWq ´1ε (4) The spatial error model should be used when spatial dependence is found in error terms.If heterogeneity cannot be removed because the covariance of prediction errors is not independent, estimation results of the regression model may cause a convenience related problem.Thus, a covariance structure of errors must be created in order to resolve such problems.That is to say, the degree of spatial interactions should be reflected in the model.In the same context, the spatial weighting matrix is added to error terms in order to remove spatial dependence which is largely affected by the independent variables of surrounding areas in the spatial error model [32].

Analysis of Spatial Autocorrelation and Heteroscedasticity, Non-Normality
In general, the tools to measure spatial autocorrelation in land and house prices are Geary C (Geary's Coefficient), G statistical values, and LISA, etc. [31,33], in addition to Moran's I.Among them, in general, Moran's I is frequently used as the tool to measure spatial autocorrelation in land and house prices [34].If observation values are similar, Moran's I value becomes positive (+), and, if they are not similar, the value becomes negative (´).Moran's I in residential and commercial areas are 0.377779 and 0.384244, respectively, both of which are significant at the 1% significance level in standard normal distribution and statistically accepting the existence of spatial dependency (see Figure 2).To identify geographic locations where spatial autocorrelation occurs, this study performed an analysis of LISA (Local Indicator of Spatial Association) as a spatial autocorrelation analysis.Through the LISA analysis, it was possible to create a cluster map showing local cases (administrative dongs) where spatial autocorrelation exists, as well as hot spot areas and cold spot areas.Here, hot spot (High-High) areas are the areas where land prices are high and it is similar to those of neighboring areas, and cold spot (Low-Low) areas are the areas where land prices are low, similar to those of neighboring areas.These are mapped in Figure 3 below.Next, for the OLS model, heteroscedasticity and non-normality of error terms are tested.Breusch-Pagan tests heteroscedasticity, and Jarque-Bera reveals non-normality of error terms.As shown in Table 3, in the heteroscedasticity test, Breusch-Pagan values were 124.891 (p = 0.000) for residential areas, and 72.0873 (p = 0.000) for commercial areas; Jarque-Bera values were 68.5969 (p = 0.000) for residential areas, and 34.2445 (p = 0.000) for commercial areas.
Consequently, the null hypothesis in the OLS model that there is no non-normality and heteroscedasticity is rejected for both residential areas and commercial areas.Thus, it seems more suitable to apply a spatial regression model to estimate and calculate real transaction land prices.

Result of Spatial Regression Analysis
Since the spatial econometrics model considering spatial autocorrelation and spatial error effects is estimated using the MLE (Maximum Likelihood Estimation) method, strictly saying, there is no statistical standards comparable to the OLS estimation model.In spatial econometrics model, model aptness is not tested with R 2 , but with Log-likelihood, AIC (Akaike Information Criterion), and SC (Schwarz Criterion).In general, if Log-likelihood increases, and AIC and SC decrease, it is considered that the model aptness is improved.Comparison of models showed that the SEM model had the highest R 2 and log-likelihood, and the lowest AIC and SC, demonstrating that SEM is more suitable than SLM, not to mention the OLS.In general, spatial econometrics was shown to be higher in R 2 and Log-likelihood, and lower in AIC, SC, and RMSE than OLS.In estimated regression coefficients, it showed lower coefficient values for most variables than the OLS model did, because spatial effects were separated.Considering land use, among attribute variables, we could find that, compared with land for residential use, the value of land for commercial business has increased by about 50%, and the value of land for residential and commercial use by about 32%.On the other hand, the value of farmland has been reduced by 120%.Furthermore, in most development project districts, land prices are determined to be about 16% higher than those in other districts.Land prices within 500 m from a railroad were 16% lower than those of other areas.Meanwhile, there were differences in land prices depending on the altitude of land as shown in Table 4. Spatial effects were found in commercial areas with the same standard applied to residential areas.The analysis showed that the SEM model had the highest R 2 and log-likelihood, and the lowest AIC and SC.Accordingly, it was considered to be the most suitable model in estimating land prices in commercial areas.The results show as shown in Table 5 that in commercial areas, like in residential areas, the SEM model had lower values for most variables than in the case of the OLS model, because spatial effects were separated.In terms of land use, among attribute variables, land for residential use has decreased by about 34% compared with land for commercial business use.In the case of land forms, the rectangular form has increased by 40% compared with ladder form.Also, in terms of road proximity, land with less than 8 m wide roads (vehicle inaccessible & corner lots) has increased by 79%, compared with land with less than 8 m wide roads.On the other hand, in terms of designation of use, OLS and SLM were significant at the 5% level, but SEM was not statistically significant when non-designation is standard.

Performance Verification of Model
So far, comparison and analysis have been conducted on the autocorrelation and spatial effects of practical transaction prices adopting the OLS model and spatial auto-regressive model (spatial lag, spatial error model) focusing on residential areas and commercial areas of Seoul City.Verification of model performance is conducted adopting the ratio (Assessment Price/Sale Price: AP/SP) between the land assessment price estimated by way of such analysis and the sale price on the real estate market.The verification will involve analyzing vertical/horizontal equity of taxation assessment suggested by the International Association of Assessing Officers [35].Prior to carrying out this process, IAAO indicates that is important to consider that, in general, the observed value of IQR deviates by 1.5 times from the outlier.Thus, in this study, an analysis of box plot was conducted applying the method suggested by IAAO.

Horizontal Equity
Horizontal equity for taxation of real estate means that real estate having equivalent market value should be equally treated and should be assessed at the same rate as market value [36].In general, the estimated price of real estate by the mass assessment model can determine the level of equity through analysis of Coefficient of Dispersion or Coefficient of Variation (Refer Table 6).Checking the degree of horizontal equity of the three models (OLS, SEM, SLM) compared in the study, SEM appeared to have the smallest value of COD and COV.The value of COD and COV could be interpreted as being excellent in terms of its horizontal equity as it becomes smaller, and IAAO recommends that the value of COD below 20.0 be acceptable for estimating assessment standards.
Though OLS, SLM and SEM conform with the recommendations given by IAAO.However, comparing the relative horizontal equity among these three models, SEM is judged to be most suitable for the estimation model for practical transaction prices of land as shown in Table 7.

Vertical Equity
COD and COV treat horizontal dispersion (or random dispersion) in specific blocks that have no relation to the price level of the individual lots.Furthermore, other forms of inequity include vertical inequity arising in the assessment of low and high assets.Vertical equity in mass assessment models of real estate means setting equivalent standards of assessment of real estate even in different price blocks by performing equiproportional assessments instead of using market sale prices.This time, when taxation value reduces as market value rises, it is said to carry regressive and vertical inequity, while when taxation value rises as market value is raised higher, it is said to carry progressive and vertical inequity [37].The most basic method of measuring such vertical equity is comparing the taxation assessment rate of land, which is the ratio between taxation assessment price of land and market sale price ratio of land [38].
Vertical equity through analysis of such ratio can be measured through Price-Related Differential.When value of PRD is bigger than 1.0, existence of regression of taxation assessment is seen, while when adversely it is smaller than 1.0, existence of progression of taxation assessment is seen, and vertical equity is recognized when it is within the scope of 0.98 ď PRD < 1.03 having a more or less margin between the top and bottom of 1.0 at the working level [37].
Based on recommendations by IAAO, the results of analysis on vertical equity of estimate pricea for each model are shown in Table 6.SEM in residential areas and OLS in commercial areas appeared to be the best in terms of vertical equity among OLS, SLM and SEM.

Conclusions and Future Research
Land price is an indicator representing the dynamic changes of a city across real estate value rises and declines, and summarily expresses the social and economic characteristics of a country.Land prices have been typically estimated with a traditional regression model.However, this model has some limitations in the sense that it fails to consider the spatial effects which occur in each region.To overcome the limitations of the traditional regression model, this research compared SLM, SEM and OLS, models of spatial econometrics, and developed instead a proper estimation model.The analysis showed the log-likelihood, AIC, and SC of the SEM model better considers spatial dependency and heterogeneity than the traditional regression model.In order to evaluate the estimation model, vertical equity and horizontal equity were analyzed for estimated prices.According to IAAO regulations, the estimated prices given by SEM for both residential and commercial areas offered an acceptable level of accuracy.
Accordingly, if we search for and eradicate the causes of spatial autocorrelation of a relevant area when we estimate land and house prices, we could improve the accuracy of the SEM model.In particular, given that, in Korea, official land prices serve as a standard to impose taxes, as well as a basis for establishing and executing various government policies such as welfare endowments and various fees, it seems necessary to conduct further research in this area and base policies on the application of spatial statistical models to estimate official land prices.

Figure 1 .
Figure 1.Study area (Seoul Metropolitan City in the Republic of Korea).

Figure 3 .
Figure 3. Local Moran's I-based hot spots and cold spots.

1
pRatio i ´Mq { n MAssessing the better the lower the value

Table 2 .
Variables applied in the model and basic statistics (Unit: million won).

Residential Commercial Mean Sd Min Max Median Mean Sd Min Max Median
Note: One million Korean won is about 837 USD (as of 25 January 2016); for a more detailed description on contact with roads, please refer to appendix.

Table 3 .
Test of heteroscedasticity and non-normality in residential areas and commercial areas.

Table 4 .
Estimated results of OLS and spatial econometric model (residential area).

Table 5 .
Estimated results of OLS and spatial econometric model (commercial area).

Table 6 .
Test criteria of Vertical Horizontal Equity.

Table A1 .
Cont.Less than 8 m wide road (corner lot) Land more than two sides of which are adjacent to a narrow street where car can pass Less than 8 m wide road (vehicle inaccessible) Land one side of which is adjacent to a narrow road where can cannot pass, but two-wheeled vehicle can pass Less than 8 m wide road (vehicle inaccessible & corner lot) Land two or more sides of which are adjacent to a narrow road where can cannot pass, but two-wheeled vehicle can pass Landlocked lot Land adjacent to a narrow road where two-wheeled vehicle cannot pass, or land which is not adjacent to road