# Sensitivity Analysis of Machine Learning Models for the Mass Appraisal of Real Estate. Case Study of Residential Units in Nicosia, Cyprus

## Abstract

## 1. Introduction

#### 1.1. Background of the Study

#### 1.2. State of the Art

## 2. Comparable Evidence and Methods

#### 2.1. Database, Pre-Processing, Methods and Performance Metrics

- Unit Enclosed extent, which is the Internal Area in m${}^{2}$ (IntArea).
- The Unit covered extent, which is the Area of covered verandahs in m${}^{2}$ (CovVer).
- The Unit uncovered extent, which is the Area of uncovered verandahs in m${}^{2}$ (UnCovVer).
- Parcel extent, that is the Area of parcel (or plot) in m${}^{2}$ (ParcExt).
- The Built Years, calculated as the difference among the date the transaction happened and the date the building was constructed, in years (BuiltYrs).
- The Unit condition code (Cond), that denotes the condition of the building, and takes values from 1 (best condition) to 4 (worst condition).
- The Unit’s view code (View), which denotes the view of the unit, with values from 1 (best view) to 4 (worst view).
- The Unit’s class code (Class), denoting the class of the building. It takes Values from 1 (best class) to 4 (worst class).
- Density (Dens), as the maximum allowed density (built m${}^{2}$, over plots m${}^{2}$) of the specific district.

#### 2.2. Error Metrics

#### 2.3. Anomaly Detection

Algorithm 1:Anomaly Detection |

#### 2.4. Machine Learning Methods

Algorithm 2:Step-wise, Higher Order Regression |

## 3. Results

#### 3.1. Regression Analysis

#### 3.2. Sensitivity Analysis

#### 3.3. How Much Data Is Big Enough?

#### 3.4. Prediction Formula

## 4. Discussion

#### Remote Sensing Integration in Mass Appraisals

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A

#### Appendix A.1. Prediction Formula with 100 terms (MAE = 19694€)

Sample Availability: The dataset was provided from the Department of Lands and Surveys. |

Methods | $\mathit{\rho}$ | MAE | RMSE | MAPE | MAXAPE | SR | $\mathit{\alpha}$ | COD |
---|---|---|---|---|---|---|---|---|

Train Set | ||||||||

Random Forests | 0.914 | 17931.100 | 28854.237 | 0.111 | 1.307 | 1.031 | 0.739 | 10.778 |

Gradient Boosting | 0.992 | 2630.784 | 8923.668 | 0.016 | 0.441 | 1.002 | 0.983 | 1.753 |

Linear Regression | 0.863 | 24546.300 | 34745.422 | 0.151 | 0.550 | 1.027 | 0.746 | 14.703 |

Non-Linear Regression | 0.880 | 23520.570 | 32700.793 | 0.146 | 1.100 | 1.032 | 0.775 | 14.197 |

Test Set | ||||||||

Random Forests | 0.877 | 20817.165 | 27950.722 | 0.134 | 0.802 | 1.040 | 0.753 | 12.950 |

Gradient Boosting | 0.803 | 24485.519 | 35946.437 | 0.151 | 1.092 | 1.009 | 0.776 | 15.017 |

Linear Regression | 0.858 | 22977.825 | 30047.707 | 0.146 | 0.506 | 1.025 | 0.789 | 14.279 |

Non-Linear Regression | 0.862 | 22525.779 | 29500.974 | 0.144 | 0.552 | 1.032 | 0.761 | 13.984 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

