# A Comparative Study of Machine Learning and Spatial Interpolation Methods for Predicting House Prices

^{*}

## Abstract

**:**

## 1. Introduction

- Are the state-of-the-art machine learning models capable of estimating missing values in the spatial data representing complex urban phenomena, such as house prices?
- If so, are they more accurate than the conventional spatial interpolation approaches?

## 2. Past Studies

## 3. Data and Methods

#### 3.1. Data

#### 3.2. Methods

_{0}[33]. The equation for IDW can be formulated as

_{0}, and $Z\left({s}_{i}\right)$ is the observed value at point s

_{i}. $w\left({s}_{i}\right)$ represents the weight applied to $Z\left({s}_{i}\right)$ and is calculated as

_{0}become significantly smaller than those for the close points. It is set to 2 by default in most computer software; however, because the choice of p is crucial for the performance of IDW, it is desirable to test several candidates and choose the one that produces the most accurate and explainable results.

## 4. Model Optimization

## 5. Results

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Li, J.; Heap, A.D.; Potter, A.; Huang, Z.; Daniell, J.J. Can we improve the spatial predictions of seabed sediments? A case study of spatial interpolation of mud content across the southwest Australian margin. Cont. Shelf Res.
**2011**, 31, 1365–1376. [Google Scholar] [CrossRef] - Tadić, J.M.; Ilić, V.; Biraud, S. Examination of geostatistical and machine-learning techniques as interpolators in anisotropic atmospheric environments. Atmos. Environ.
**2015**, 111, 28–38. [Google Scholar] [CrossRef] [Green Version] - Appelhans, T.; Mwangomo, E.; Hardy, D.R.; Hemp, A.; Nauss, T. Evaluating machine learning approaches for the interpolation of monthly air temperature at Mt. Kilimanjaro, Tanzania. Spat. Stat.
**2015**, 14, 91–113. [Google Scholar] [CrossRef] [Green Version] - Mariano, C.; Mónica, B. A random forest-based algorithm for data-intensive spatial interpolation in crop yield mapping. Comput. Electron. Agric.
**2021**, 184, 106094. [Google Scholar] [CrossRef] - Zhu, D.; Cheng, X.; Zhang, F.; Yao, X.; Gao, Y.; Liu, Y. Spatial interpolation using conditional generative adversarial neural networks. Int. J. Geogr. Inf. Sci.
**2020**, 34, 735–758. [Google Scholar] [CrossRef] - Hu, Q.; Li, Z.; Wang, L.; Huang, Y.; Wang, Y.; Li, L. Rainfall Spatial Estimations: A Review from Spatial Interpolation to Multi-Source Data Merging. Water
**2019**, 11, 579. [Google Scholar] [CrossRef] [Green Version] - Nghiep, N.; Al, C. Predicting Housing Value: A Comparison of Multiple Regression Analysis and Artificial Neural Networks. J. Real Estate Res.
**2001**, 22, 313–336. [Google Scholar] [CrossRef] - Lin, G.-F.; Chen, L.-H. A spatial interpolation method based on radial basis function networks incorporating a semivariogram model. J. Hydrol.
**2004**, 288, 288–298. [Google Scholar] [CrossRef] - Li, J.; Heap, A.D.; Potter, A.; Daniell, J.J. Application of machine learning methods to spatial interpolation of environmental variables. Environ. Model. Softw.
**2011**, 26, 1647–1659. [Google Scholar] [CrossRef] - Kleinke, K.; Reinecke, J.; Salfrán, D.; Spiess, M. Applied Multiple Imputation; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
- Meng, Q.; Liu, Z.; Borders, B.E. Assessment of regression kriging for spatial interpolation—Comparisons of seven GIS interpolation methods. Cartogr. Geogr. Inf. Sci.
**2013**, 40, 28–39. [Google Scholar] [CrossRef] - Henrico, I. Optimal interpolation method to predict the bathymetry of Saldanha Bay. Trans. GIS
**2021**, 25, 1991–2009. [Google Scholar] [CrossRef] - Wu, T.; Li, Y. Spatial interpolation of temperature in the United States using residual kriging. Appl. Geogr.
**2013**, 44, 112–120. [Google Scholar] [CrossRef] - Bhattacharjee, S.; Chen, J.; Ghosh, S.K. Spatio-temporal prediction of land surface temperature using semantic kriging. Trans. GIS
**2020**, 24, 189–212. [Google Scholar] [CrossRef] [Green Version] - Martínez, M.G.; Lorenzo, J.M.M.; Rubio, N.G. Kriging methodology for regional economic analysis: Estimating the housing price in Albacete. Int. Adv. Econ. Res.
**2000**, 6, 438–450. [Google Scholar] [CrossRef] - McCluskey, W.J.; Deddis, W.G.; Lamont, I.G.; Borst, R.A. The application of surface generated interpolation models for the prediction of residential property values. J. Prop. Investig. Financ.
**2000**, 18, 162–176. [Google Scholar] [CrossRef] [Green Version] - Montero, J.; Larraz, B. Interpolation Methods for Geographical Data: Housing and Commercial Establishment Markets. J. Real Estate Res.
**2011**, 33, 233–244. [Google Scholar] [CrossRef] - Kuntz, M.; Helbich, M. Geostatistical mapping of real estate prices: An empirical comparison of kriging and cokriging. Int. J. Geogr. Inf. Sci.
**2014**, 28, 1904–1921. [Google Scholar] [CrossRef] - Kim, G.; Lee, B.; Park, B. A comparative analysis on spatial interpolation techniques for price estimation of housing facilities. Geogr. J. Korea
**2013**, 47, 119–127. [Google Scholar] - Choi, J.H.; Kim, B.J. A study for applicability of cokriging techniques for estimating the real transaction price of land. J. Korean Soc. Geospat. Inf. Sci.
**2015**, 23, 55–63. [Google Scholar] - Rigol, J.P.; Jarvis, C.H.; Stuart, N. Artificial neural networks as a tool for spatial interpolation. Int. J. Geogr. Inf. Sci.
**2001**, 15, 323–343. [Google Scholar] [CrossRef] - Pérez-Rave, J.I.; Correa-Morales, J.C.; González-Echavarría, F. A machine learning approach to big data regression analysis of real estate prices for inferential and predictive purposes. J. Prop. Res.
**2019**, 36, 59–96. [Google Scholar] [CrossRef] - Čeh, M.; Kilibarda, M.; Lisec, A.; Bajat, B. Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments. ISPRS Int. J. Geo-Inf.
**2018**, 7, 168. [Google Scholar] [CrossRef] [Green Version] - Seya, H.; Shiroi, D. A Comparison of Residential Apartment Rent Price Predictions Using a Large Data Set: Kriging versus Deep Neural Network. Geogr. Anal.
**2022**, 54, 239–260. [Google Scholar] [CrossRef] - Abraham, A. Artificial Neural Networks. In Handbook of Measuring System Design; Oklahoma State University: Stillwater, OK, USA, 2005. [Google Scholar]
- Minsky, M.; Papert, S. Perceptrons: An Introduction to Computational Geometry; The MIT Press: Cambridge, MA, USA, January 1969; Available online: https://mitpress.mit.edu/books/perceptrons (accessed on 12 June 2022).
- Montavon, G.; Samek, W.; Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digit. Signal Processing
**2018**, 73, 1–15. [Google Scholar] [CrossRef] - Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw.
**2015**, 61, 85–117. [Google Scholar] [CrossRef] [Green Version] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version] - Antipov, E.A.; Pokryshevskaya, E.B. Mass appraisal of residential apartments: An application of Random forest for valuation and a CART-based approach for model diagnostics. Expert Syst. Appl.
**2012**, 39, 1772–1778. [Google Scholar] [CrossRef] [Green Version] - Strobl, C.; Malley, J.; Tutz, G. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol. Methods
**2009**, 14, 323–348. [Google Scholar] [CrossRef] [Green Version] - Liaw, A.; Wiener, M. Classification and regression by randomForest. R News
**2002**, 2, 18–22. [Google Scholar] - Bivand, R.; Pebesma, E.J.; Gómez-Rubio, V. Applied Spatial Data Analysis with R. In Use R! Springer: New York, NY, USA, 2013. [Google Scholar]
- Cressie, N. The origins of kriging. Math. Geol.
**1990**, 22, 239–252. [Google Scholar] [CrossRef] - Armstrong, M. Problems with universal kriging. J. Int. Assoc. Math. Geol.
**1984**, 16, 101–108. [Google Scholar] [CrossRef] - Oliver, M.A.; Webster, R. Kriging: A method of interpolation for geographical information systems. Int. J. Geogr. Inf. Syst.
**1990**, 4, 313–332. [Google Scholar] [CrossRef] - Webster, R.; McBratney, A.B. Mapping soil fertility at Broom’s Barn by simple kriging. J. Sci. Food Agric.
**1987**, 38, 97–115. [Google Scholar] [CrossRef] - Van der Meer, F. Introduction to Geostatistics; ITC Lecture Notes: Enschede, The Netherlands, 1993. [Google Scholar]
- Probst, P.; Boulesteix, A.-L.; Bischl, B. Tunability: Importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res.
**2019**, 20, 1934–1965. [Google Scholar] - Cressie, N.; Johannesson, G. Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B
**2008**, 70, 209–226. [Google Scholar] [CrossRef] - Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, UK, 2016. [Google Scholar]
- Shavitt, I.; Segal, E. Regularization learning networks: Deep learning for tabular datasets. Adv. Neural Inf. Processing Syst.
**2018**, 31. Available online: https://proceedings.neurips.cc/paper/2018 (accessed on 12 June 2022). - Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional gan. Adv. Neural Inf. Processing Syst.
**2019**, 32. Available online: https://proceedings.neurips.cc/paper/2019 (accessed on 12 June 2022). - Lundberg, S.M.; Erion, G.G.; Lee, S.-I. Consistent individualized feature attribution for tree ensembles. arXiv
**2018**, arXiv:1802.03888. [Google Scholar]

**Figure 1.**Joining process of the real estate transactions data and the integrated building information data.

**Figure 3.**Semivariogram and fitting functions representing the spherical model, the exponential model, and the Gaussian model.

**Figure 4.**Geographic distributions of the actual (observed) sales prices and the residuals from each method: (

**a**) actual values, (

**b**) neural networks, (

**c**) random forests, (

**d**) IDW, and (

**e**) ordinary kriging.

Type | Name | Description | Class |
---|---|---|---|

Predictor | bldg_area | Land area occupied by the building | Numeric |

yr_built | Year built | Numeric | |

flr_area | Total floor area of the building | Numeric | |

site_area | Area of the site on which the building is located | Numeric | |

height | Height of the building (in metres) | Numeric | |

bcr | Building coverage area | Numeric | |

far | Floor area ratio | Numeric | |

x | Latitude | Numeric | |

y | Longitude | Numeric | |

district | District in which the building is located | Categorical | |

yr | Year of the sales transaction | Numeric | |

month | Month of the sales transaction | Numeric | |

net_area | Floor area of the property | Numeric | |

flr_lv | Floor level | Integer | |

type | Type of the property (e.g., apartments, detached houses) | Categorical | |

Target | price | Sales price | Numeric |

Model | No. of Hidden Layers | No. of Nodes | RMSE | |
---|---|---|---|---|

Max | Min | |||

1 | 2 | 32 | 16 | 193.75 |

2 | 2 | 128 | 64 | 152.35 |

3 | 2 | 256 | 128 | 143.19 |

4 | 3 | 32 | 16 | 210.97 |

5 | 3 | 128 | 64 | 114.08 |

6 | 3 | 256 | 128 | 103.03 |

7 | 4 | 32 | 16 | 124.04 |

8 | 4 | 128 | 64 | 101.99 |

9 | 4 | 256 | 128 | 97.90 |

10 | 5 | 32 | 16 | 117.59 |

11 | 5 | 128 | 64 | 98.65 |

12 | 5 | 256 | 128 | 107.60 |

**Table 3.**RMSE of random forest models with different numbers of decision trees and predictor variables.

Model | No. of Trees | Predictor Variables | RMSE |
---|---|---|---|

1 | 50 | 4 | 116.12 |

2 | 50 | 8 | 87.47 |

3 | 50 | 16 | 82.58 |

4 | 100 | 4 | 115.44 |

5 | 100 | 8 | 86.85 |

6 | 100 | 16 | 82.07 |

7 | 150 | 4 | 115.27 |

8 | 150 | 8 | 86.68 |

9 | 150 | 16 | 81.92 |

10 | 200 | 4 | 115.13 |

11 | 200 | 8 | 86.57 |

12 | 200 | 16 | 81.86 |

Model | Fitting Function | Nugget | Sill | Range | RMSE |
---|---|---|---|---|---|

1 | Spherical | 18,189.04 | 45,968.56 | 1000 | 131.00 |

2 | 25,750.27 | 55,018.88 | 3000 | 134.11 | |

3 | 27,728.88 | 61,251.09 | 5000 | 134.95 | |

4 | 28,872.34 | 66,397.40 | 7000 | 135.38 | |

5 | Exponential | 21,004.69 | 54,192.23 | 1000 | 131.89 |

6 | 26,625.02 | 67,663.44 | 3000 | 134.32 | |

7 | 28,158.26 | 78,786.51 | 5000 | 134.97 | |

8 | 29,120.28 | 94,920.26 | 7000 | 135.37 | |

9 | Gaussian | 28,021.85 | 51,190.93 | 1000 | 136.91 |

10 | 31,722.12 | 65,187.40 | 3000 | 137.62 | |

11 | 33,177.85 | 76,133.24 | 5000 | 137.74 | |

12 | 34,099.72 | 98,675.17 | 7000 | 137.79 |

**Table 5.**Prediction accuracy of machine learning and conventional interpolation techniques measured by four error metrics, MAE, RMSE, MAPE, and MASE.

Method | MAE | RMSE | MAPE | MASE |
---|---|---|---|---|

Neural network | 64.4615 | 102.0348 | 0.1260 | 0.2759 |

Random forest | 48.1897 | 80.8789 | 0.0952 | 0.2063 |

IDW | 78.4847 | 126.5288 | 0.1544 | 0.3360 |

Ordinary kriging | 83.5145 | 131.0042 | 0.1655 | 0.3575 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kim, J.; Lee, Y.; Lee, M.-H.; Hong, S.-Y.
A Comparative Study of Machine Learning and Spatial Interpolation Methods for Predicting House Prices. *Sustainability* **2022**, *14*, 9056.
https://doi.org/10.3390/su14159056

**AMA Style**

Kim J, Lee Y, Lee M-H, Hong S-Y.
A Comparative Study of Machine Learning and Spatial Interpolation Methods for Predicting House Prices. *Sustainability*. 2022; 14(15):9056.
https://doi.org/10.3390/su14159056

**Chicago/Turabian Style**

Kim, Jeonghyeon, Youngho Lee, Myeong-Hun Lee, and Seong-Yun Hong.
2022. "A Comparative Study of Machine Learning and Spatial Interpolation Methods for Predicting House Prices" *Sustainability* 14, no. 15: 9056.
https://doi.org/10.3390/su14159056