# Spatial Determinants of Real Estate Appraisals in The Netherlands: A Machine Learning Approach

## Abstract

## 1. Introduction

- Which ML approaches are currently used for hedonic pricing, and how do they perform?
- Which factors are significant for price differences between houses across cities?
- Which data are available about these factors?
- How can we construct a method for hedonic pricing across different cities using the obtained insights?
- What are the results of applying this method with a realistic dataset?

## 2. Background

#### 2.1. Dutch House Price Indices and the Repeat-Sales Model

#### 2.2. Hedonic Price Models

#### 2.3. Linear Regression (LR)

#### 2.4. Geographically Weighted Regression (GWR)

#### 2.5. Multi-Scale Geographically Weighted Regression (MGWR)

#### 2.6. Regression Trees and Extreme Gradient Boost (XGBoost)

#### 2.7. Features for House Price Estimations

## 3. Data and Methods

#### 3.1. Model Metrics

#### 3.1.1. Quantitative Metrics

#### 3.1.2. Qualitative Metrics

#### 3.2. Exploration of the Response Variable

#### 3.3. Exploration of the Explanatory Variables

#### 3.4. Hyper-Parameter Optimisation Using CV

## 4. Results

## 5. Discussion

## 6. Conclusions

- The lack of a feature to model house quality. The remaining unexplained variance of 17% is likely due to a missing variable that explains the quality of the house itself or other location characteristics. An official appraisal report contains more detailed information about the state of a house. This can help paint a better picture of the house itself.
- For example, the ground sinkage map from TU Delft provides an interesting use case for looking at real estate portfolio risk factors. Ground sinkage is a real problem in the Netherlands, especially in Groningen. As a result of the gas exploitation, the property values are reduced drastically in the region. This poses a clear risk to the mortgage owner and the money lender. Another problem for many houses is foundation rot; perhaps risk areas can be identified by combining sinkage data with ground compositions.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

LR | Linear regression |

(M)GWR | (Multi-scale) Geographically weighted regression |

XGBoost | Extreme gradient boosting |

CBS | ‘Centraal Bureau voor de Statistiek’ (ENG: Central Agency for Statistics) |

BAG | ‘Basisregistratie adressen & gebouwen’ (ENG: Base registry addresses & buildings) |

DKK | ‘Digitale kadastrale kaart’ (ENG: Digital cadastral map) |

## Appendix A. Figures Related to Dutch Housing Market

**Table A1.**% change in house prices (January 2000–January 2020), per provinces of the Netherlands [42].

Province | % Increase over 2000–2020 | Province | % Increase over 2000–2020 |
---|---|---|---|

Drenthe | 56.35% | Noord-Brabant | 42.90% |

Flevoland | 44.02% | Noord-Holland | 76.70% |

Friesland | 55.34% | Overijssel | 49.73% |

Gelderland | 45.05% | Utrecht | 70.87% |

Groningen | 67.48% | Zeeland | 74.83% |

Limburg | 38.16% | Zuid-Holland | 52.36% |

**Table A2.**% change in house prices (January 2000–January 2020), per housing type. Source: Kadaster [42].

Housing Type | % Increase over 2000–2020 |
---|---|

Detached | 54.4% |

Semi-detached | 51.2% |

Terraced House | 64.0% |

Corner House | 61.5% |

Apartment | 75.3% |

## Appendix B. Figures Related to Models

**Figure A1.**Number of real estate appraisals of Stater, (

**left**) 2008, (

**middle**) 2020, (

**right**) January 2000–January 2021.

**Figure A2.**Exploration of external variables from Kadaster & CBS (Amersfoort, 2018). (

**a**) Kadaster—Land lot size (${\mathrm{m}}^{2}$) & total floor area (${\mathrm{m}}^{2}$). (

**b**) RVO—Energy Labels.

Municipality | Kernel (Bandwidth) | Adaptive/Fixed |
---|---|---|

Amersfoort | Gaussian (0.28) | Adaptive |

Amsterdam | Gaussian (0.19) | Adaptive |

Eindhoven | Gaussian (0.27) | Adaptive |

Groningen | Gaussian (0.43) | Adaptive |

Rotterdam | Gaussian (0.25) | Adaptive |

Abbreviation | Description | VIF | Source |
---|---|---|---|

is_gezinwng | Apartment (0) or Family home (1) | 1.91 | Stater |

garage | Presence of garage (yes/no = 1/0) | 1.82 | - |

parkeerplaats | % of people aged 0–14 (500 m tiles) | 1.43 | - |

vbo_oppervlakte | The total floor area of a house (${\mathrm{m}}^{2}$) | 1.82 | Kadaster |

pnd_bouwjaar | Build year | 2.31 | - |

perceel_oppr | The total floor area of a house (${\mathrm{m}}^{2}$) | 1.80 | - |

Pand_energieklasse | Energy label / class (factor) | 2.69 | RVO |

INW_014 | % of people aged 0–14 (500 m tiles) | 15.05 | CBS |

INW_1524 | % of people aged 15–24 (500 m tiles) | 1.53 | - |

INW_2544 | % of people aged 25–44 (500 m tiles) | 11.47 | - |

INW_4464 | % of people aged 45–64 (500 m tiles) | 7.67 | - |

INW_65PL | % of people aged 65+ (500 m tiles) | 10.91 | - |

TOTHH_EENP | % of single person house holds | 4.12 | - |

TOTHH_MPZK | % of households > 1 and no children | 4.78 | - |

HH_EENOUD | % of one parent households with children | 4.68 | - |

WON_MRGEZ | % of family homes | 4.4 | - |

WON_NBEW | % non-inhabited homes | 1.90 | - |

OAD | Address density (address/${\mathrm{km}}^{2}$) | 3.87 | - |

STED_500 | Urbanisation (factor) | 6.12 | - |

P_KOOPWON | % owner-occupied home | 3.43 | - |

WOZWONING | Average WOZ-Waarde (×1000€) | 3.15 | - |

M_INKHH | Median income group (factor) | 4.12 | - |

G_ELEK_WON | Average Electricity Usage (kwH) | 2.10 | - |

P_LINK_HH | % of households belonging to bottom 40% of national income | 13.12 | - |

P_HINK_HH | % of households belonging to top 20% of national income | 14.45 | - |

AFS_SUPERM | Distance to nearest supermarket (km) | 3.22 | - |

AFS_OPRIT | Distance to nearest provincial road or highway (km) | 2.48 | - |

AFS_CAFE | Distance to nearest cafe (km) | 2.21 | - |

AFS_BIBLIO | Distance to nearest library (km) | 2.27 | - |

AFS_ONDVRT | Distance to nearest secondary education (km) | 1.77 | - |

AFS_APOTH | Distance to nearest pharmacy (km) | 2.08 | - |

Abbreviation | Description | Source |
---|---|---|

dist_centre | Distance to city center (km) | Self-computed |

UITKMINAOW | Income from state pension (AOW) | CBS |

INWONER | Inhabitants at start of year | - |

AANTAL_HH | Number of households. | - |

HH_TWEEOUD | % of two parent households with children | - |

P_NW_MIG_A | Percentage of inhabitants (non-western) | - |

P_HUURWON | Percentage of rented homes | - |

G_GA_WON | Average Gas Usage (${\mathrm{m}}^{3}$) | - |

AV1/5/10/20 vars. | Variables describing ’Amount of X within radius 1/5/10/20 km’ (hospitals, stores, schools etc.) | - |

Other AFS vars. | Distance variables to other amenities (Swimming pool, attraction parks, restaurants, hotels, hospital and others.) | - |

Pand_gebouwtype | Home type | RVO |

Pand_subtype | Home subtype | - |

**Figure A3.**Variable importance for the LR model of Amersfoort (2018). All 5 municipalities have similar results.

Municipality | ${\mathit{R}}^{2}$ | RMSE | MAE | MAPE |
---|---|---|---|---|

Amersfoort | 0.810 | €61,928 | €50,177 | 7.51% |

Amsterdam | 0.822 | €62,596 | €52,183 | 7.40% |

Eindhoven | 0.815 | €62,942 | €54,631 | 7.98% |

Groningen | 0.821 | €79,192 | €54,131 | 8.29% |

Rotterdam | 0.837 | €58,561 | €49,287 | 7.25% |

**Figure A4.**Model fit of XGBoost models for Amsterdam, Eindhoven, Rotterdam, Groniningen (2018), (orange line is y = x).

**Figure A5.**XGBoost Variable Importance of Amersfoort & Amsterdam (2018). (

**a**) Amersfoort. (

**b**) Amsterdam.

**Figure 1.**Exploration of the residential real estate appraisal dataset of Stater N.V. (

**a**) Appraisals per year (2000–2020). (

**b**) Records per municipality (2020). (

**c**) Increase in average appraisal value, (Amersfoort, 2000 & 2020).

**Figure 2.**Various CBS 100 × 100 m statistics (Amersfoort, 2018). (

**a**) Taxation value (WOZ-waarde) (€1k). (

**b**) Electricity usage (kWh). (

**c**) Nearest cafe (km).

**Figure 4.**Q—Q plot showing impact on overall fit for including all appraisals (Amersfoort, 2018). (

**a**) All appraisals, poor fit. (

**b**) Appraisals < €750,000, adequate fit.

**Figure 5.**Plots describing the GWR model (Amersfoort, 2018). (

**a**) The influence of living area. (

**b**) Variable importance.

**Figure 7.**Differences between XGBoost prediction and indexation using a regional price index (green = XGBoost predicts higher). (

**a**) For apartments, XGBoost predicts 17.31% higher. (

**b**) For family homes, XGBoost predicts 11.12% higher.

Characteristic | Influence | Sources |
---|---|---|

Year of construction | Positive/Negative | [5,16,34] |

Living area | Strongly positive | [5,13,16] |

Type of housing | Positive | [5,13,16] |

Garden space/presence of garden | Positive | [13,16] |

# of rooms (bedrooms, bathrooms) | Positive | [13,16] |

Presence of facilities (shower, lift, garage, etc.) | Slightly positive | [13,16] |

Furnished | Slightly positive | [13,16] |

Energy Efficiency | Slightly positive | [5] |

Sustainability measures | Slightly positive | [5] |

Characteristic | Influence | Sources |
---|---|---|

Household income | Strongly positive | [7,18] |

House shortage | Strongly positive | [35] |

Notable view (sea, lake, park) | Strongly positive | [33] |

Time to travel or distance to city centre | Strongly positive | [14,19] |

Proximity to place of worship | Positive/Negative | [5,36] |

Distance to highway | Negative | [37] |

Distance to heavy industry | Negative | [37] |

Presence to high rise/view obstruction | Negative | [16] |

Crime rate | Negative | [19] |

Unemployment rate | Slightly negative | [18] |

Population density | Positive | [35] |

Presence of cultural landmarks | Slightly positive | [18] |

Birth surplus | None | [36] |

Municipalities | Samples | Mean | Std Dev. | Min | Max |
---|---|---|---|---|---|

Amersfoort | 1494 | €319,400 | €62,744 | €58,800 | €1,250,000 |

Amsterdam | 5084 | €451,650 | €84,992 | €81,000 | €1,500,000 |

Eindhoven | 1845 | €278,800 | €58,421 | €75,000 | €1,155,000 |

Groningen | 1160 | €222,610 | €49,143 | €45,000 | €955,000 |

Rotterdam | 3011 | €254,930 | €53,329 | €55,000 | €875,000 |

Dataset Name | Contents | Joined Using | Source |
---|---|---|---|

BAG: ‘Addresses and Buildings key register’ | Geo-coordinates, build year, surface area | Address | Kadaster [9] |

DKK: ‘Digital cadastral map’ | Land lot area | BAG-VBO-ID | Kadaster [38] |

CBS Square statistics | Variables for areas of 100 × 100 m and 500 × 500 m | Geo-coordinates | CBS [39] |

EP-Online | Energy labels | BAG-VBO-ID | RVO [40] |

Municipality | Build Year | Land Lot Area | Address Density | Households | Energy Usage | Distance | Energy Label |
---|---|---|---|---|---|---|---|

Amersfoort | 4 (0.27%) | 451 (30.19%) | 15 (1.00%) | 16 (1.07%) | 28 (1.87%) | 15 (1.00%) | 454 (30.39%) |

Amsterdam | 116 (2.28%) | 731 (14.38%) | 0 | 71 (1.40%) | 127 (2.50%) | 0 | 1391 (27.36%) |

Eindhoven | 16 (0.87%) | 659 (35.72%) | 0 | 93 (5.04%) | 2 (0.11%) | 2 (0.11%) | 587 (31.82%) |

Groningen | 25 (2.16%) | 382 (32.93%) | 0 | 97 (8.36%) | 15 (1.29%) | 2 (0.17%) | 312 (26.90%) |

Rotterdam | 1 (0.03%) | 732 (24.31%) | 0 | 39 (1.30%) | 17 (0.56%) | 0 | 948 (31.48%) |

Municipality | WOZ-Waarde | Income |
---|---|---|

Amersfoort | 96 (6.43%) | 74 (4.95%) |

Amsterdam | 398 (7.83%) | 259 (5.09%) |

Eindhoven | 171 (9.27%) | 88 (4.77%) |

Groningen | 138 (11.90%) | 84 (7.24%) |

Rotterdam | 216 (7.17%) | 101 (3.35%) |

Metric | ${\mathit{R}}^{2}$ | RMSE | MAE | MAPE |
---|---|---|---|---|

LR (all appraisals) | 0.709 | €150,211 | €72,391 | 11.81% |

LR * | 0.785 | €85,628 | €56,219 | 9.61% |

LR-LOG * | 0.768 | €89,136 | €63,577 | 10.62% |

Municipality | ${\mathit{R}}^{2}$ | RMSE | MAE | MAPE |
---|---|---|---|---|

Amersfoort | 0.822 | €61,459 | €48,393 | 7.42% |

Amsterdam | 0.831 | €60,213 | €53,671 | 7.31% |

Eindhoven | 0.812 | €62,942 | €54,103 | 8.01% |

Groningen | 0.789 | €83,233 | €55,213 | 8.61% |

Rotterdam | 0.861 | €56,431 | €47,312 | 6.99% |

Municipality | ${\mathit{R}}^{2}$ | RMSE | MAE | MAPE |
---|---|---|---|---|

Amersfoort | 0.851 | €57,391 | €34,283 | 5.38% |

Amsterdam | 0.845 | €57,964 | €35,258 | 5.50% |

Eindhoven | 0.838 | €57,385 | €36,192 | 5.62% |

Groningen | 0.829 | €59,832 | €38,241 | 5.88% |

Rotterdam | 0.871 | €56,144 | €34,831 | 5.45% |

Year | 2018 | 2020 | ||||||
---|---|---|---|---|---|---|---|---|

${\mathit{R}}^{2}$ | RMSE | MAE | MAPE | ${\mathit{R}}^{2}$ | RMSE | MAE | MAPE | |

LR | 0.725 | €97,232 | €67,814 | 10.55% | 0.734 | €94,927 | €62,871 | 10.23% |

GWR | 0.822 | €64,856 | €51,738 | 7.67% | 0.809 | €65,826 | €52,237 | 7.92% |

XGBoost | 0.848 | €58,374 | €35,761 | 5.89% | 0.852 | €61,028 | €35,451 | 5.76% |

${\mathit{R}}^{2}$ | RMSE | MAE | MAPE | |
---|---|---|---|---|

XGBoost | 0.832 | 65,312 | 43,625 | 6.35% |

