An Extended Semi-Supervised Regression Approach with Co-Training and Geographical Weighted Regression : A Case Study of Housing Prices in Beijing

This paper proposes an extended semi-supervised regression approach to enhance the prediction accuracy of housing prices within the geographical information science field. The method, referred to as co-training geographical weighted regression (COGWR), aims to fully utilize the positive aspects of both the geographical weighted regression (GWR) method and the semi-supervised learning paradigm. Housing prices in Beijing are assessed to validate the feasibility of the proposed model. The COGWR model demonstrated a better goodness-of-fit than the GWR when housing price data were limited because a COGWR is able to effectively absorb no-price data with explanatory variables into its learning by considering spatial variations and nonstationarity that may introduce significant biases into housing prices. This result demonstrates that a semisupervised geographic weighted regression may be effectively used to predict housing prices.


Introduction
The housing market is defined as one where housing services are allocated by the mechanism of supply and demand and could be influenced by macro-economic variables, spatial differences, characteristics of the community structure and environmental amenities [1,2].Changing housing prices have been of concern to both residents and governments in that they influence the socio-economic conditions and have a further impact on the national economic stability [2].Therefore, the issue of predicting housing prices has recently been a focus of research in the geo-information field [3][4][5][6].
Housing prices are typically predicted via the establishment of a regression model that uses house price parameters (e.g., structural and neighborhood characteristics of the real estate) [7,8].Many authors have focused on the hedonic model to predict housing prices, and different hedonic models are compared in real estate economics [9][10][11].Although hedonic regression models are widely adopted, the presence of spatial dependence is detrimental to the efficiency and unbiasedness of the OLS model in traditional hedonic models.Spatial location is an important factor in housing prices [10].Real estate prices tend to be spatially heterogeneous [12].Therefore, spatial economics models have been proposed to address these issues.LeSage and Pace provide a broad review of these methods [13,14].Goodman and Thibodeau introduce the concept of hierarchical linear modeling in which dwelling characteristics, neighborhood characteristics and submarkets interact to influence housing prices [15].Brunsdon and Fotheringham propose a geographically-weighted regression as a local variation modeling technique to explore spatial nonstationarity [6,7].
Supposing that the number of house samples is limited, researchers have yet to determine how to enhance the goodness-of-fit of housing prices by using explanatory variables for houses where the price is unknown.Semi-supervised learning is an efficient approach that attempts to integrate no-price data to achieve a strong generalization by using multiple learners, and some studies have utilized semi-supervised regression models to address this issue [16][17][18][19][20].However, when traditional semi-supervised regression methods are applied to spatial data, the processes are assumed to be constant over space, which is not accurate.For housing price data, the assumption of stability over space is generally unrealistic, as housing price parameters tend to vary over a study area [21].
In recognizing the above challenges, this research proposes an extended semi-supervised regression approach to fully utilize the advantages of both the geographical weighted regression and the semi-supervised learning methods to increase the goodness-of-fit with respect to housing price data.
The remainder of this paper is organized as follows.In Section 2, related studies are briefly reviewed.In Section 3, the experimental data and proposed approach are introduced.Section 4 describes the experimental results.Section 5 provides concluding remarks.

Literature Review
The term hedonic is used to describe "the weighting of the relative importance of various components among others in constructing an index of usefulness and desirability" [22].The hedonic price model is based on the hedonic hypothesis that goods are valued for their utility-bearing attributes or characteristics [23].If the prices of these attributes are known, or can be estimated, and the attribute composition of a particular differentiated good is also known, the hedonic methodology will provide a framework for value estimation [24].The hedonic model regards houses as a composite commodity formed by structural attributes (age of house, number of bedrooms, presence of a garage, etc.), by locational attributes that vary between properties (good transport links, accessibility to shops and services, proximity to downtown, etc.) and by neighborhood attributes (population density, unemployment, measures of social stress, etc.).The price of a property is assumed to be a realization of the values of these attributes [25].
The conditional parametric model termed geographical weighted regression (GWR) is an explicitly local model and circumvents the problems discussed in the context of discrete modeling of heterogeneity and polynomial regression [6,7].GWR implicitly assumes continuously-changing price functions and models.A strong advantage of GWR is its flexibility, and the price function needs no prior assumption concerning the price determination process and its spatial variation [26,27].Lu, B. et al. investigates the GWR model by applying it with alternative, non-Euclidean distance (non-ED) metrics.A case study of a London house price dataset is coupled with hedonic independent variables, where GWR models are calibrated with Euclidean distance (ED), road network distance and travel time metrics.The results indicate that GWR calibrated with a non-Euclidean metric can not only improve the model fit, but also provide additional and useful insights into the nature of varying relationships within the house price dataset [4].A geographically-and temporally-weighted autoregressive model (GTWAR) has been developed to account for both nonstationary and auto-correlated effects simultaneously and formulates a two-stage least squares framework to estimate this model [5].
However, the GWR model assumes that all explanatory variables vary over space, and the global effects are often neglected; the mixed geographically-weighted regression (MGWR) model has been proposed to explore spatially-stationary and non-stationary effects.It is shown by the MGWR empirical examples that significant spatial variation in some of the estimated parameters is present, while the global effects provide evidence for policy-based linkages and an economically-connected housing market [28] Considerable interest has been devoted to the non-conventional methods in real estate property assessment.The most commonly-studied methods are neural network-based approaches.The appeal of neural network-based methods lies in that they do not depend on assumptions about the data [29].Neural networks are more robust to model misspecification and especially to various peculiarities in how various explanatory variables are measured [30].A fuzzy logic framework has also been proposed as an alternative to conventional property assessment approaches [29,31].Kuşan, H. et al. introduce household-level data into hedonic models in order to measure the heterogeneity of implicit prices regarding household type, age, educational attainment and income [32].
In the field of semi-supervised learning, labeled identifies the training examples known in advance and unlabeled identifies the training examples that are unknown.In this paper, "labeled data" refers to the sample of known housing prices, and "unlabeled data" refers to the explanatory variables of houses for which prices are unknown.Brefeld et al. developed a co-regularized least squares regression (coRLSR) algorithm to handle larger sets of unlabeled examples based on the co-learning framework, and the experiments show a significant error reduction and large runtime improvement for the semi-parametric approximation [17].Zhou and Li applied the co-training mechanism to a KNN regression.Two different KNN regression models have been utilized; each model labels unlabeled data for the other regressor, particularly where the labeling confidence is predicted based on the influence of labeling unlabeled samples on the labeled data [18,19].

Data Used in the Experiments
A case study is carried out using housing price data observed in Beijing, China.Beijing is one of the most developed cities and is an economic center in China, with tertiary industries accounting for 71.3% of its GDP.This makes it the first post-industrial city in mainland China.Along with a reform process, both economic prosperity and rapid urbanization have boosted demand for housing in the city.The increased demand for housing was accompanied by increased supply as prices and rents increased [33].
An overview of the housing prices variables is shown in Table 1.A total of 1350 residential houses are included in the study, and their geolocations are shown in Figure 1.The study data are provided by the National Bureau of Statistics, and structural, neighborhood and temporal variables are extracted to explain the house prices in this study.
The dependent variable (lnp) is the logarithmically-transformed sales price of the house, with the price unit of RMB.The structural characteristics of each house are described by five covariates.Total floor area of the house, with the area unit of m 2 , is logarithmically transformed as lnarea_total.The number of bath rooms is expressed as nbath.The management fee of the property, with the fee unit of RMB/m 2 , is logarithmically transformed as lnpfee.The ratio of houses is logarithmically transformed as lnplotratio.Additionally, the green ratio is logarithmically transformed as lngratio.The neighborhood of each house is described by the urban street network of Beijing, which defines the city's structural skeleton and directly affects the city's transportation and economic performance.The temporal variable is the age of building at time of sale (age).

Geographically-Weighted Regression Model
GWR is a non-stationary technique that models the spatially-varying relationships between independent and dependent variables [3][4][5]34].The GWR model can be expressed as: where the coordinate of point  in space is expressed as (  ,   );  0 (  ,   ) represents the intercept value; and   (  ,   ) represents a set of values for the number  of parameters at point  .The random error, which conforms to a normal distribution, is denoted as   ,   ∼ (0,  2 ).There is no correlation in random error between different points: Cov(  ,   ) = 0( ≠ ) .The regression parameter  ̂ at point  can be attained using the least squares model.
The fitted value  ̂ is:

Geographically-Weighted Regression Model
GWR is a non-stationary technique that models the spatially-varying relationships between independent and dependent variables [3][4][5]34].The GWR model can be expressed as: where the coordinate of point i in space is expressed as pu i , v i q; β 0 pu i , v i q represents the intercept value; and β k pu i , v i q represents a set of values for the number p of parameters at point i.The random error, which conforms to a normal distribution, is denoted as ε i , ε i " N `0, σ 2 ˘.There is no correlation in random error between different points: Cov `εi , ε j ˘" 0 pi ‰ jq.The regression parameter βi at point i can be attained using the least squares model.
The fitted value ŷ is: where the weighting matrix W i is based on the distances between regression point i and the data points around it.Two types of weighting matrix are used, fixed and adaptive kernels.In a fixed kernel function, an optimum spatial kernel bandwidth is calculated and applied over the study area.
The most commonly-used fixed weighting function is the Gaussian function: where h is a nonnegative parameter known as bandwidth and produces a decay of influence with the distance between locations i and j.
The commonly-used adaptive weighting is the bi-square function, which represents different bandwidths at location i.
If the predicted value of y i from GWR is denoted by ŷi phq, the sum of the squared error can be written as: CV phq " The bandwidth is achieved automatically with an optimization technique by minimizing Equation ( 6) in terms of goodness-of-fit statistics.

Co-Training Learning Paradigm
The co-training paradigm is one of the most prominent semi-supervised approaches.It was first proposed by Blum and Mitchell, trains two classifiers separately on two different views, e.g., two independent sets of attributes, and uses the prediction of each classifier on unlabeled examples to enhance the training set of the other [16].As shown in Figure 2, the standard co-training algorithm requires that attributes be naturally partitioned into two sets, each of which is sufficient for learning and conditionally independent of the other given the class label [35].
The co-training paradigm is one of the most prominent semi-supervised approaches.It was first proposed by Blum and Mitchell, trains two classifiers separately on two different views, e.g., two independent sets of attributes, and uses the prediction of each classifier on unlabeled examples to enhance the training set of the other [16].As shown in Figure 2, the standard co-training algorithm requires that attributes be naturally partitioned into two sets, each of which is sufficient for learning and conditionally independent of the other given the class label [35].Goldman and Zhou extended the co-training algorithm so that it would not require two views, but two different special learning algorithms [36].Zhou and Li proposed to use three classifiers, called tri-training, to explicate unlabeled data.In their process, an unlabeled example is labeled and used to teach one classifier for whether the other two classifiers agree on its labeling [37].Li and Zhou further extended this idea by integrating more classifiers into the training process [38].

Co-Training Geographically-Weighted Regression Approach
Let  = {( 1 ,  1 ), ⋯ , ( || ,  || )} denote the housing price sample set, where the i-th instance   is described by d attributes,   is the housing price value and || is the number of real-value examples; let U denote the no real value dataset, where the instances are also described by d attributes, whose real values are unknown, and || is the number of no real value examples.The procedure is described as follows: Goldman and Zhou extended the co-training algorithm so that it would not require two views, but two different special learning algorithms [36].Zhou and Li proposed to use three classifiers, called tri-training, to explicate unlabeled data.In their process, an unlabeled example is labeled and used to teach one classifier for whether the other two classifiers agree on its labeling [37].Li and Zhou further extended this idea by integrating more classifiers into the training process [38].

Co-Training Geographically-Weighted Regression Approach
Let L " !px 1 , y 1 q , ¨¨¨, ´x|L| , y |L| ¯) denote the housing price sample set, where the i-th instance x i is described by d attributes, y i is the housing price value and |L| is the number of real-value examples; let U denote the no real value dataset, where the instances are also described by d attributes, whose real values are unknown, and |U| is the number of no real value examples.The procedure is described as follows: 1 Initialize: Build up the adaptive Gauss kernel co-training geographical weighted regression (COGWR) regressor R 1 and the adaptive bi-square kernel COGWR regressor R 2 with the labeled samples L. Randomly select a small number of unlabeled samples and construct an unlabeled data pool P.
2 Absorb unlabeled samples: In each round, select an unlabeled record r from the unlabeled data pool P.
(1) Assign the predicted value ŷr of the no real value record using COGWR regressor R 1 and add the record to the COGWR regressor R 1 1 .If the R 2 of R 1 1 decreases in relation to the original R 2 using Equation (2), this record will be absorbed by regressor R 2 1 .
The goodness of r can be evaluated using the criterion shown in Equation (7).
If the value of ∆ r is positive, then utilizing px r , ŷr q is beneficial.
(2) Otherwise, assign the predicted value ŷr of the no real value record using the COGWR regressor R 2 and add the record to the COGWR regressor R 2 1 .If the R 2 of R 2 1 decreases in relation to the original R 2 , this record will be absorbed by the regressor.
The goodness of r can be evaluated using the criterion shown in Equation ( 8).
∆ r " ÿ If the value of ∆ r is positive, then utilizing px r , ŷr q is beneficial.

Experimental Results and Comparisons
In this section, the GWR model using Gaussian kernel functions to consider the spatial heterogeneity of housing price data was adopted to validate the reliability of housing price predictions in Beijing.Secondly, the housing price data were analyzed using the GWR and COGWR methods.

The Results of GWR Model
In linear regression models, strong collinearity between explanatory variables could increase the variance of the estimated regression coefficients and result in misleading conclusions about relationships in the phenomenon under study.In the local linear regression setting, this could lead to imprecise coefficient patterns with counter-intuitive signs in significant portions of the study area [39].For example, Wheeler shows that collinearity can degrade coefficient precision in GWR and lead to counter-intuitive signs for some regression coefficients at some locations in the study area [40].
In this study, multicollinearity is diagnosed using the diagnostic tools of the variance inflation factor (VIF), condition index and variance-decomposition proportions.The VIF values are indicators for the severity of multicollinearity, and variables with VIF values greater than 10 should be eliminated.It is suggested by Belsley to use condition indexes greater than or equal to 30 and variance proportions greater than 0.50 for each variance component as an indication of collinearity in a regression model [41].In this study, the VIF values of explanatory covariates are less than 10, and the condition index of all explanatory covariates and the intercept is less than 30.
It is known that an adaptive bandwidth has been proven to be highly suitable in practice compared to a predefined and fixed bandwidth [27,29].In this experiment, the adaptive Gauss kernel function have been adopted.The GWR model is tested, and the results are shown in Table 2 [27,42,43].The statistics indicate that housing prices in Beijing can be modeled using explanatory variables.
Approximately 70.1% of the variation in housing prices could be explained by the model according to R 2 .The signs for all of the parameters between the lower quartile (LQ) and upper quartile (UQ) are shown in Table 2. Descriptive statistics for the local parameter coefficients produced by GWR reveal much variance in the parameter values, suggesting the presence of spatial non-stationarity in the relationships between house prices and the explanatory variables.The floor area, number of baths and age of building at time of sale have positive parameter values, whereas the fee of property management, the plot ratio of houses, the green ratio and ringroad have both negative and positive parameter values.For the test of non-stationarity, the F-test proposed by Leung et al. (2000) was conducted [42].Table 3 lists the variances, F-statistic values of regression coefficients and their corresponding p-values.Those statistically-significant values at the 5% level are marked with an asterisk "*".It can be found that only one variable exhibits nonsignificant spatial variations in GWR: the number of bath rooms (nbath).The remaining variables display significant spatial variations.One important characteristic of the GWR-based technique is that the local parameter estimates that denote local relationships are mappable and thus allow for visual analysis.Taking the coefficients of logarithmically-transformed floor area as an example, we can group them into several intervals and color each interval to visualize the spatial variation patterns of this variable.As is shown in Figure 4, it could been seen that house prices are influenced by the floor area.In the central part of Beijing, a large amount of houses has been built with a small dwelling size.The reason is that central area of Beijing was planned in earlier times and no more land could be used to build houses.In recent years, with the fast development of the economy and the rapid progress of urbanization, a large-scale movement of urban expansion has emerged all over the country, and large-sized houses are built in the external part of Beijing.
intervals and color each interval to visualize the spatial variation patterns of this variable.As is shown in Figure 4, it could been seen that house prices are influenced by the floor area.In the central part of Beijing, a large amount of houses has been built with a small dwelling size.The reason is that central area of Beijing was planned in earlier times and no more land could be used to build houses.In recent years, with the fast development of the economy and the rapid progress of urbanization, a large-scale movement of urban expansion has emerged all over the country, and large-sized houses are built in the external part of Beijing.

Comparison of the COGWR with the GWR Model
In this paper, we have introduced an efficient semi-supervised regression approach to predict housing prices.A popular routine in evaluating semi-supervised algorithms is adopted [18,19].In detail, the residential plots are randomly partitioned into labeled/unlabeled/test datasets according to certain ratios.About 25% of the data is kept as test examples, while the remaining 75% of the data is used as the set of training data.In the training set, labeled and unlabeled data are partitioned under different label rates including 10%, 20%, 30%, 40% and 50%.Fifty runs of the experiments are conducted; in each run, the RSS (Residual Sum of Squares), MSE (Mean Squared Error) and AIC values are recorded.In the experiments, the maximum number of iterations is set to 50, and the size of the pool used in the learning process is fixed to 50.Since the learning process may stop before the maximum number of iterations is reached, and in that case, the final RSS, MSE and AIC values are used in computing the average RSS, MSE and AIC values of the iterations.
It is necessary to investigate whether the COGWR model performs significantly better than the GWR models.The improvements of the RSS, MSE and AIC values between COGWR and GWR are shown in Table 4. First, we compared RSS and MSE between COGWR and GWR.All COGWR regressors (Gauss and bi-square kernel functions) achieve better performance than the GWR regressors at the label rates of 10%, 20% and 30%.For instance, compared to the GWR regressors, the improvement in RSS achieved by the COGWR regressors was (3.242, 2.216), (3.375, 2.801) and (3.551, 2.909), respectively.However, for label rates of 40% and 50%, no significant improvement was observed when compared to the GWR regressor.For example, the improvement of RSS calculated using the COGWR regressor was (´0.144, 0.101) and (´0.314, ´1.633), respectively.
Second, we compared AIC values between the COGWR and GWR models and determined whether the COGWR model is significantly more reliable than the GWR models.According to Fotheringham and Bo Wu [5,44], a "serious" difference between the two models is generally regarded as one in which the difference in AIC values between the models is greater than three.When the labeled rate is 10%, 20% or 30%, significant improvements are achieved by using the COGWR when compared to the GWR.For instance, when the label rate is 10%, 20% and 30%, the difference between the COGWR and GWR models was (23.716, 18.645), (36.479, 21.328) and (41.921, 22.899), respectively.However, no significant improvement is achieved by using the COGWR when the labeled rate is 40% or 50%.When the labeled rate is 40% and 50%, the difference between the COGWR and GWR is (´2.892,2.204) and (´4.812, ´16.641), respectively.
With the increasing of the label rate, the improvement of goodness-of-fit endowed by exploiting unlabeled house price data seems to be decreasing.This is not strange, since it can be perceived from the performance of labeling that the initial GWR regressors become robust when more labeled house price data are available and, therefore, are more difficult to enhance.

Conclusions
Traditional semi-supervised regression methods could not be directly applied to spatial data, since the assumption of stability over space is generally unrealistic.This paper introduced novel co-training GWR approaches, which fully utilize the advantages of both the geographical regression and the semi-supervised learners to increase the goodness-of-fit with respect to unlabeled house structural, locational and neighborhood characteristics and geographical data.
The COGWR model, which fully utilizes the positive aspects of both the geographical weighted regression (GWR) method and semi-supervised learning paradigm, was implemented, and the results reveal that when the amount of labeled data is small, the COGWR method significantly improves the performance of the GWR method.By absorbing unlabeled data, this method converts sparse data areas into dense data areas.Therefore, the robustness of the regressors is enhanced.This suggests that incorporating no-price data into a GWR model could yield meaningful results.When the label rate of housing price data increases, the gains from absorbing unlabeled data decrease because the regressors trained on the labeled training examples become stronger and are thus more difficult to improve.
This study is a beneficial attempt in the research of geo-information and real estate economics.It offers a reference to both the decision-making and the theoretical research.The spatial diversity of the regression coefficients is of utmost importance for locally-acting decision-makers [12].When the amount of house price data is limited, the COGWR approach is a useful tool for real estate practitioners to fully exploit no-price data with explanatory variables.Some limitations still remain in our study, and further work is required.In this paper, not all explanatory variables vary over space, and the mixed geographically-weighted regression (MGWR) approach should be investigated to prevent the limitations of fixed effects by exploring spatially-stationary and non-stationary effects in the future [12].Spatiotemporal heterogeneity prevails in real estate data, and a temporal semi-supervised GWR must still be pursued.

Figure 1 .
Figure 1.Map of the study area.

Figure 1 .
Figure 1.Map of the study area.

Figure 2 .
Figure 2. Flowchart of the co-training learning paradigm.

Figure 2 .
Figure 2. Flowchart of the co-training learning paradigm.

Figure 4 .
Figure 4. Spatial variation of the logarithmically=transformed floor area coefficient.

Figure 4 .
Figure 4. Spatial variation of the logarithmically = transformed floor area coefficient.

Table 1 .
Variables used to predict housing prices in Beijing, China.

Table 3 .
The non-stationarity test results for the GWR models.

Table 4 .
Improvement between the COGWR and GWR models.