3.1. Data Collection
The study selected Shanghai as the study area. Shanghai is one of the mostly populated cities in China with a lot of primary schools. The large number of primary schools comes with a diversity of educational quality and social reputation. Moreover, unlike other cities, e.g., Hangzhou, where high-quality primary schools are often located together and close to each other, high-quality primary schools in Shanghai are relatively far away from each other, and there are always ordinary primary schools located in between. The cluster of high-quality primary schools poses a challenge in estimating the causal effect due to differences in location characteristics between high-quality and ordinary primary schools. The distribution of primary schools in Shanghai, however, makes it easy to find paired samples with similar locations, with one being associated with a high-quality primary school and the other location not being associated with one.
The paper restricts the analysis to the main urban districts in the central area of Shanghai, including Huangpu, Xuhui, Changning, Yangpu, Hongkou, Putuo, Jingan, and the inner-city part of Pudong. Because there is no official rank of education quality for the primary schools in Shanghai, the selection of key primary schools was based on the lists of well-recognized accounts on social media. The study selected 70 “key primary schools” (see
Table A1 in
Appendix A), i.e., key publicly funded primary schools from the lists with a leading teaching quality. The sampling complexes were then selected according to the attendance zones of each key primary school. Each selected complex within the attendance zones of the key primary schools was paired with a complex within a 500 m radius from the school that was outside of the attendance zones.
The study collected detailed information on all transactions of second-hand houses in the selected complexes from 1 January to 31 October 2019 from websites of main estate agencies, such as Lianjia, Soufang, and Anjuke. The complexes with less than two records of transactions were dropped from the sample. In total, 127 pairs of residential complexes were selected. The spatial location of the selected primary schools is shown in
Figure 1 (refer to the green local marks). The average housing price in each residential complex was computed according to the records of the transactions. During the computation process, all transaction prices were corrected by the discount ratio presented in
Table A2 in
Appendix A. Additionally, the average rents in each residential complex were computed according to the advertisements of renting houses on the website.
The study also collected information about the characteristics of the selected residential complexes that may affect the housing prices. Specifically, the study obtained information about building age, floor area ratio, green rate, and management service fee. The study also obtained the location characteristics of the selected residential complexes, including the distance to the nearest subway station, nearest hospital, Renmin square (an important commercial center in Shanghai), Hongqiao railway station, and key primary school. The location characteristics information were obtained using Baidu map (
https://map.baidu.com/, accessed on 1 December 2019).
3.2. Empirical Method
To estimate the capitalization of the school quality on the housing price, the paper first defined the baseline model for housing price according to a hedonic price model. The model is formulated as follows:
where
is the average resale housing price of the residential complex
during the time period spanning from 1 January to 31 October 2019. For the purpose of robustness tests, the natural logarithms of the average resale prices were also used in the estimation.
is a dummy variable denoting the key primary school, which is equal to one if complex
is within the attendance zone of the key primary school; it is zero otherwise.
represents the coefficients of interest, which captures the key primary school’s capitalization into housing prices.
is a vector of control variables, which captures the characteristics of residential complexes including the building age (
), floor area ratio (
), green rate (
), and management service fee (
) of a complex. In line with existing studies [
20,
35,
36,
37], this study also controlled for the locational characteristics
of the residential complexes. The locational characteristics were measured by a complex’s straight-line distance to various locations, such as the nearest subway station (
), nearest hospital (
), Renmin square (
), Hongqiao railway station (
), and key primary school (
).
,
, and
are the unknown coefficients to be estimated, and
is the independent and identically distributed error terms.
Table 1 presents the definitions of the variables.
In order to address the concern of potentially omitted variables, the study constructed complex pairs according to a boundary fixed effect method. Specifically, the study first paired each sample residential complex within a key primary school zone with a neighboring out-of-zone residential complex. Then, the study applied the paired difference model by differentiating all dependent and independent variables for the two paired complexes. Because the paired complexes shared many similar location-attached characteristics, the bias caused by the omitted location-specific factors is likely to be cancelled out. For a similar reason, the constant term
in Equation (1) is also likely to cancel out. Then, the paired difference model of the house resale prices is specified as follows:
where
,
, and
are the differences in the housing prices and other characteristics between the paired residential complexes. In particular,
is the difference in access to key primary school between the paired residential complexes, which is always equal to one in Equation (2). In other words, the parameter
captures the key primary school’s capitalization into housing price.
and
are coefficients to be estimated, and
is the error term.
To further test whether unobserved factors affect the results, the impact of school quality on rents is estimated according to Equations (3) and (4) as follows:
where
is the average rent of the residential complex
at the time of data collection and
is the difference in house rent between the paired residential complexes.
,
,
,
,
,
, and
are the coefficients to be estimated.
and
are error terms.
In principle, housing prices and rent should be both affected by the same characteristics of a residential complex. The major difference between house owners and lessees, however, is the attendance qualification of the primary school. That is, for residents within the attendance zone of key primary school, only house owners are eligible to attend the school, while lessees are not qualified [
13]. Thus, if there are any omitted or unobserved factors associated with the key primary school, the variable of the key primary school will have a significant impact on the level of the rents, i.e., if
and
are different to zero. However, if the impact of the key primary school on the rents is insignificant, the concern of the omitted variables should be a minor problem.
Aside from the estimation of the main effect of high-quality primary schools on housing prices, the paper also investigated the heterogeneity that exists across various house sizes. The number of bedrooms has often been used to define housing types [
38]. In this study, when computing the average housing prices, the sample was differentiated into two groups, that is, small houses (houses with only one or two bedrooms) and large houses (those with more than two bedrooms). For each residential complex, the average housing prices for small houses and large houses were then computed, respectively. Because some residential complexes may only have small houses and others only large houses, the final residential complex sample for small and large houses differed from each other. Nevertheless, the study re-estimated Equations (1) and (2) for the average price of small houses and average price of large houses, respectively, using the newly generated residential complex sample.