Mass Appraisal Models of Real Estate in the 21st Century: A Systematic Literature Review

With the increasing volume and active transaction of real estate properties, mass appraisal has been widely adopted in many countries for different purposes, including assessment of property tax. In this paper, 104 papers are selected for the systematic literature review of mass appraisal models and methods from 2000 to 2018. The review focuses on the application trend and classification of mass appraisal and highlights a 3I-trend, namely AI-Based model, GIS-Based model and MIX-Based model. The characteristics of different mass appraisal models are analyzed and compared. Finally, the future trend of mass appraisal based on model perspective is defined as “mass appraisal 2.0”: mass appraisal is the appraisal procedure of model establishment, analysis and test of group of properties as of a given date, combined with artificial intelligence, geo-information systems, and mixed methods, to better model the real estate value of non-spatial and spatial data.


Introduction
Property tax is an important source of fiscal revenue in many developed economies. In emerging markets such as China, a pilot property tax experiment is taking place thus other countries' experience is helpful. Introduction of the property tax can expand local tax sources, regulate the real estate market and facilitate income redistribution to mediate wealth polarization. From the perspective of tax administration, there is direct tax and indirect tax. Property tax is a type of asset tax which belongs to direct tax under the comprehensive tax system. However, the establishment of a property tax system is complicated, involving not only relevant policies and laws, but also a valuation mechanism and methods. Generally speaking, a large number of tax base assessments of real estate need to be carried out in a relatively short period of time. At the same time, this assessment should conform to the law of asset assessment. In practice, it is essential to adopt a mass appraisal model which fits a specific country's real estate market structure as well as is adaptive to its change over time.
This paper therefore provides a systematic review of the mass appraisal models used for real estate assessment from 2000 to 2018. It starts with the background of mass appraisal, including the definition of mass appraisal and the role of models in mass appraisal. It is followed by the appraisal materials and methods. Models in the selected papers are discussed and classified into the 3I-trend (AI-based model, GIS-based model and MIX-based model) during the last two decades. Finally, it is concluded with a summary of the mass appraisal models and recommendations for future research.

Definition of Mass Appraisal
"Mass appraisal is the process of valuing a group of properties as of a given date and using common data, standardized methods, and statistical testing." The above definition comes from the SMARP (Standard of Mass Appraisal of Real Property) [1], and it can be considered as the "Mass Appraisal 1.0". Before the 21st century, relevant institutions and scholars have done fruitful work on the theoretical construction and standard setting of real estate mass appraisal, which is summarized in Table 1. Both institutions and scholars have made a lot of effort for the research of mass appraisal theory and the formulation of appraisal criteria. They have provided criteria for the process, criteria and methods of mass appraisal in the practical application process and explained the contents in detail. They also provide guidance on complaints following the assessment results.

The Role of Models in Mass Appraisal
"Any appraisal, whether single-property appraisal or mass appraisal, uses a model, that is, a representation in words or an equation of the relationship between value and variables representing factors of supply and demand." The above description from SMARP shows the role of models in mass appraisal [1]. There are three traditional real estate appraisal approaches: the sales comparison approach, the income approach and the cost approach [5]. The key to large-scale collection of real estate tax is to establish a high efficiency and fair mass appraisal model of real estate tax base. With the development of computer-assisted mass appraisal (CAMA), both models and standards gradually adopt an automated valuation methodology (AVM) for mass appraisal [6].

Paper Retrieval
In different countries or regions, researchers may use similar words to represent "mass appraisal". By the combination of all the related papers, the key words used in the literature search are mass appraisal, mass valuation, real estate appraisal and property valuation. The search rule used is "mass appraisal" OR "mass valuation" OR "real estate appraisal" OR "property valuation". These search terms need to appear in the title, the abstract, or the keywords of the references. The search is restricted to articles published in English. In order to achieve a comprehensive overview, a wide online literature search is conducted using Web of Science electronic databases.
Four hundred and seventy five studies are identified as an initial dataset of studies (including an article, book chapter, early access editorial material, and conference proceeding). Next, the screening criterion is to include only papers with "article" and "review", after which 299 studies are selected. Finally, 104 papers are selected for literature review after reading the abstracts of the selected papers.

Overview of the Selected Papers
A brief analysis is made of the 104 selected papers. Figure 1 shows that the number of relevant papers published between 1 January 2000 and 31 December 2018 increased substantially, especially in the past two years, indicating an increasing research interest in mass appraisal.

96
Increasingly more scholars are interested in the research of the mass appraisal model. Table 3 Table 3 lists the top five cited articles and their authors, journals, years, and citations.

Discussion of Methods and Models
Generally, researchers apply the ideas, models and methods of other fields, such as statistics, computer science or geographic science, to the field of real estate mass appraisal. Some scholars have made beneficial attempts to summarize the models from different perspectives, such as academic discipline [7], prediction accuracy [8], computational intelligence [9][10][11], taxation purpose [12][13][14], mortgage purpose [15] and automated valuation application in relevant countries and regions [6,[16][17][18][19][20]. Through the classification of various models, a 3I-trend is summarized, namely AI-Based Model, GIS-Based model and MIX-Based Model.

Multiple Regression Analysis
Because the target of mass appraisal is a large number of properties, and the valuation results need to be explained to the public, the basic needs are convenient operation and simple understanding. Once we collect the relevant data of the appraisal target, the direct method is to analyze the relationship between the relevant attributes (building age, area, floor, height, etc.) and the related property value. Through quantitative analysis, the mathematical relationship between the dependent variable and independent variable is calculated. Mass appraisal of real estate with similar attributes will be estimated by using the known mathematical relationship. Multiple regression analysis (MRA) is a statistical method to predict the real estate value (dependent variable) based on two or more other relevant attributes (independent variables) [8].
The beauty of MRA is its simplicity and computational advantages. However, the main concern in MRA is the difficulty with choosing the right functional form of the dependent variable and sometimes the assumptions related to the error term in the regression model may not be satisfied [21]. A large number of papers center on the MRA as the basic model or comparative model, but researchers focus on different aspects of the regression [22][23][24][25][26][27][28][29]. For example, the additive nonparametric regression allows the data themselves to determine the curve shape and it replaces the independent variables by an unspecified non-linear smoot function [21]. The generalized additive models for location, scale and shape (GAMLSS) contains both parametric and semi-parametric models. With GAMLSS, the distribution of the response variable is not restricted to the exponential family and different additive terms can be included in the regression predictors for the parameters that index the distribution [25]. The quantile regression concentrates to the regression residuals of the model by producing many regression models in order to find a valuation model that is fair to the tax authority and to property owners [26].

Expert System and Decision Support System
The professional knowledge of real estate experts or appraisers is the long-term accumulation of experience. An expert system provides a potential solution that lies between the purely heuristic sales adjustment grid and the more computationally intensive hedonic model.
Residential real estate intrinsic values deviate due to various factors that require consideration in the valuation process. The challenge is to develop an expert system that is adaptable in real life for actual appraisal problems. The expert system will not "teach itself" but rather the appraiser will utilize the data, in a fuzzy-logic fashion, to develop adjustment factors. The experts' problem-solving practice is just the nature of valuation procedure. It will be more useful with limited data or poorly characterized probability functions. Researchers make useful attempts and practical applications to the relevant problem of an expert system [30,31] and decision support system [32][33][34][35].

Artificial Neural Networks (ANN)
Originally, the artificial neural networks (ANN) is designed to replicate the human brain's learning processes. The ANN is made up of a complex network of artificial neurons which perform three basic functions like a neuron in the human brain. The neural network typically consists of an input layer, an output layer and at least one layer of non-linear processing elements, known as the hidden layer. First, it receives inputs from the other artificial neurons through weighted links; second, it sums and processes these inputs; finally, it outputs the results to other artificial neurons.
An important advantage of ANNs in system modeling is that there is no need to confirm the model in advance. By training the sample input data, the ANN adapts itself to reproduce the output. One of the most popular ANN structures is the multi-layer perceptron (MLP) [8]. The ANN performs well for modeling the non-linear relationship because of its characteristics of semi-parametric regression. In addition to the basic MRA, although researchers have to face the "black box" of the ANN's structure, it is still the most popular model used in AI-based models [36][37][38][39][40][41][42][43][44][45][46][47][48][49].

Tree-Based Model
Tree-Based model performs well both in classification and regression with good accuracy, stability and interoperability. Typical models include a decision tree, random forest and boosted tree.
There are two common types of decision tree models: M5 and MARS (multivariate adaptive regression splines). M5 is a model trees algorithm that predicts continuous variables for regression. It is then optimized into M5P which combines the decision tree with linear regression at the nodes. Tree construction, pruning and smoothing are the three major steps when applying the M5P methodology. Another decision tree is the MARS which is a non-parametric regression. MARS models are divided into three steps: the forward process, the backward pruning process and the model selection process. Reyes-Bueno et al. (2018) describe the main difference between M5P and MARS. At the borders of the partitioned regions, M5P is discrete while MARS is continuous [50].
A random forest is a kind of ensemble learning to integrate many decision trees into a "forest". The model can run efficiently on a large dataset of properties and deal with input variables without deletion. Antipov and Pokryshevskaya (2012) try to use the random forest model in mass appraisal for the first time and find it performs the best among other models [51].
Compared with a random forest, a boosted tree model can achieve higher accuracy and faster running speed. These advantages are urgently needed for a mass appraisal with a large number of data and a time node of appraisal. McCluskey et al. (2014) apply the boosted regression tree for Malaysia's mass appraisal of residential property. They find that the boosted tree is better than the MRA model in the coefficient of dispersion and mean absolute percentage error. However, the lack of transparency and difficulty of transfering variable importance into quantifiable variable effects still exist [52].

Hierarchical Model
The traditional econometric model, such as Ordinary Least Squares (OLS), does not consider the hierarchical structure of the data. The use of the hierarchical model can overcome this shortcoming. Researchers use this framework for property valuation application, such as the hierarchical Bayesian approach [53] and analytic hierarchy process [54]. The hierarchical model also calculates the percentage of variance error caused by each level. Two types of hierarchical models have been used in real estate evaluation, called hierarchical linear model (HLM) and hierarchical trend model (HTM). Arribas et al. (2016) use the HLM to classify the variables into apartment and neighborhood levels. They find that the HTM parameters have lower estimated variance than OLS [55]. Meanwhile, the HTM can be seen as an extension of a dummy variable model with time varying constants for the different clusters [56].

Cluster Analysis
The heterogeneity and homogeneity of property data have an important position in mass appraisal modeling. Cluster analysis is a process of classifying data into different classes or clusters, so that the targets in the same cluster are similar, while the targets in one cluster are different from those in other clusters. Based on the sample data, cluster analysis can automatically classify all the database. This data-mining or data-preprocessing procedures can transfer the real estate market with heterogeneity into a real estate submarket with homogeneity. The cluster approach can be classified into various types: hierarchical clustering, partitioning clustering, grid-based clustering, density-based clustering, fuzzy-based clustering and model-based clustering [57][58][59]. After the cluster analysis, it is necessary to explain the practical meaning of different clusters. Meaningless clustering will re-guide the setting of cluster analysis.

Rough Set Theory and Fuzzy Set Theory
Uncertainty is an objective phenomenon in mass appraisal. The uncertainty of the mass appraisal affects the stability of the model and the accuracy of the results. Some researchers look at the way in which uncertainty can be incorporated into the explicit model of the three traditional models mentioned in Section 2.2 [60,61]. For the condition of imprecise data set, which appears in the emerging or weak informatization housing market, rough set theory and fuzzy set theory provide an available way for mass appraisal of real estate.
Moreover, the application of rough set theory (RST) in the real estate field underlines its potentialities for mass appraisal modeling. RST creates a way to run the property appraisal model without considering the relevant indicators which affect the value of real estate [62,63].
Last but not least, with the introduction of fuzzy set, the judgment and thinking process of humans can be directly expressed in a relatively simple mathematical form, which makes it possible to deal with complex systems in a practical and human way of thinking. The fuzzy set theory can solve for the proximity between different property samples to be assessed. It is used to effectively rectify the weight of data, even if the degree of proximity is low [58,[64][65][66][67][68][69][70].

Reasoning-Based Model
Reasoning-based model is also a useful method for tackling uncertainty. The reasons to opt reasoning for mass appraisal are as follows. First, reasoning is the most familiar form of legal justification and thus becomes an instrument for resolving value disputes in courts of law. Second, reasoning used in the evaluation procedure focuses on the comparison of a subject property with a principled consistency through a mathematical optimization model. Researchers have done some trials in analogical reasoning (special case of the inductive reasoning) [71] and case-based reasoning [72]. From the actual situation, appraisals may generate different comments even if they obtain the same data set. Therefore, it should be noted that there needs to be a unified standard when using reasoning methods.

Other Models
There are many other kinds of classical models that have been applied in the mass appraisal field, i.e., genetic algorithm, support vector machine, data envelopment analysis and conformal predictors. Although only a few scholars have tried these models, the results have good reference value.
Genetic algorithm (GA) is a computational model simulating the natural selection and genetic mechanism of Darwinian biological evolution, and it is a method to search the optimal solution by simulating natural evolution. Morano et al. (2018) couple evolutionary polynomial regression with genetic algorithms to search those models with maximization accuracy of data and parsimony of mathematical functions [73]. Ahn et al. (2012) use ridge regression combined with a genetic algorithm (GA-Ridge) to test performance in the Korean real estate market [74].
Support vector machine (SVM) is a kind of supervised learning and is a fast and reliable linear classifier. Given the training data, SVM algorithm gets an optimal hyperplane to classify the training data. But its good performance is under limited amount of data i.e., thousands of sample datasets. The massive property data (hundreds of thousands or more) for mass appraisal may cause a long operation time of SVM [34,75].
Data envelopment analysis (DEA) is a research field of operational research, management science and mathematical economics. It can be used for the evaluation of the value range for real estate units. The uncertainty in the unit's value resulting from market transactions was considered by explicitly representing the economic agents involved in a transaction, namely the buyer and the seller, whose actions establish a set of accomplished transactions [76].
Conformal predictors (CP) is a classical algorithm of machine learning which can provide reliable predictions in the form of regions. For regression, a prediction interval will be typically formed by the regions. The regions are reliable under the user-defined confidence level [77].

GIS-Based Model
GIS, known as geo-information system/science, focuses on spatial or geographical data. Each real estate has its own spatial attribute information. The spatial characteristics together with its non-spatial information contribute to the property value. Many scholars have paid attention to the GIS features of properties and considered their impacts on appraisal [44,[78][79][80][81][82][83][84][85][86]. Meanwhile, some scholars such as Bourassa et al. (2007) [87], McCluskey and Borst (2007) [88] have made improvements in the classification and compilation of GIS-based model.

Geographically Weighted Regression (GWR)
A geographically weighted regression (GWR) model is the most used one in GIS-based models. By embedding the local spatial structure into the linear regression model, GWR can detect the non-stationary of spatial relations, also known as spatial non-stationarity. Generally, compared with the MRA regarded as the global regression model, GWR presents a way to explore the local regression analysis of each location. The model is easy to use, and the estimation result has a clear analytical expression. The statistical test can also be applied for parameter estimation. Lockwood and Rossini (2011) compare ten different models for mass appraisal and finally discover that the two GWR models get much higher levels of accuracy [89]. McCluskey and Borst (2011) use the GWR to detect the submarket in order to highlight the necessity of real estate market segmentation for mass appraisal [90]. Dimopoulos and Moulas (2016) also show that the coefficient of determination of GWR is much higher than the ordinary least squares model based on the database of the Greek real estate market [91]. Bidanset et al. (2017) confirm the effectiveness of GWR and further evaluate the impact of two important factors (kernel and bandwidth) of the GWR model [92]. Based on the relevant sale date of properties, Borst (2012) devises a temporal kernel to weigh properties and changes the GWR into a space-time model [93].

Geographically Weighted Principal Component Analysis
The traditional methods such as cluster analysis and principal component analysis (PCA) do not consider the spatial heterogeneity for submarket division. The PCA ignores the original data's spatial structure. By taking the advantage of geographically weighting, Wu et al. (2018) integrate geographically weighted principal component analysis into a modified data-driven method in order to deal with housing submarkets issues [94]. This model can address the problem of spatial heterogeneity and define submarkets by considering both spatial contiguity and attribute similarity. However, comprehensive factors, such as the property structure, transaction time and years of property rights which are neglected could be contained in the model promotion.

Spatial Error Model and Spatial Lag Model
Spatial auto-regressive models (SAR) consider the spatial dependence of real estate and improve based on the MRA model [9]. The spatial error model (SEM) and the spatial lag model (SLM) are most widely used.
The SEM builds on the spatial dependence of the error terms. The error caused by a property is dependent on the error of its surrounding ones. Zhang et al. (2015) improve the SEM with fuzzy set theory and a spatial weight matrix for the mass appraisal of commercial properties in China's real estate market [95]. Uberti at al. (2018) use this model for farmland mass appraisal in Brazil's real estate market [96].
The SLM comprises a spatially lagged dependent variable of the regression model [97]. The price of a property is dependent on the prices of its surrounding ones. Walacik et al. (2013) compare the performance of SLM, GWR and geostatistical models [98]. In terms of the geostatistical model, Palma et al. (2019) apply a geostatistical model to analyse the spatial-temporal evolution of the residential real estate market in Italy [99]. Bidanset and Lombard (2014) compare SLM with GWR. They find that GWR has a lower coefficient of dispersion than SLM [100]. Quintos (2013) uses SLM together with a spatial weight matrix to perform the role of a locational baseline value or location-adjustment factor for the purpose of mass appraisal and taxation [101].

Location Value Response Surface
The location value response surface (LVRS) is a mass appraisal method. The application of LVRS allows the appraiser to analyze the effect of location using GIS.
There are three methods for LVRS modelling. The first method consists of calculating a location adjustment factor based on the spatial distribution of the selling prices. The second method is based on the measurement of the variance between actual prices and predicted prices using the MRA model without location variable. The third method builds an interpolation grid to reflect the influence on each property of the location ratio factors within its proximity [102,103]. Meanwhile, the foundation of this model is the spatial correlation between variables. It can also be associated with classical spatial interpolation methods, such as inverse distance weighted tool, kriging interpolation, natural neighbor tool, and et al.

MIX-Based Model
In this part, the main purpose is to explain the emphasis and thinking of a mix-based model application in mass appraisal. The first type is that the model itself adopts hybrid thinking. Glennon et al. (2018) consider five different forecast combination methods to construct a weighted average of component forecasts [104]. Guo et al. (2014) integrate some elements from a sales comparison approach and income approach into the cost approach to improve the accuracy of the valuation of real estate [105]. The second type is like the models in Sections 4.1 and 4.2. The relevant models are based on the traditional models or existing models, combined with AI and GIS methods, and then highlight the application and analysis. Furthermore, some models can combine with each other into a better one, such as the fuzzy clustering [58], geostatistical model and clustering [59,106], multi criteria analysis and genetic algorithms [73], ANN and GIS [36], support vector machine and decision support system [34] and so forth. The third type is to mix with innovative ideas and unique perspectives. For example,  apply not only the information of traditional real estate data, but also extra real-time market information from online crowdsourcing feedback, which makes the estimated result close to the market [107]. Based on different application scenarios, researchers also try to do the appraisal in the field of industrial property (wind farm programming) [108] and commercial leisure property (golf courses) [109]. Metzner and Kindt (2018) pay attention to the parameters of the model. They extract 407 parameters from their former papers and classify them into five levels [110]. You et al. (2017) create an image-based appraisal model especially for the online user [111]. Mou et al. (2018) build a real estate appraisal model mainly for recommending the short day-on-market properties for a estate agency [112]. Lorenz and Luetzkendorf (2011) integrate sustainability issues into the valuation process [113].

Conclusion
This paper systematically analyzes the literature on the models of mass appraisal. We have identified three main trends (AI-based model, GIS-based model and MIX-based model) under which 104 articles are reviewed. The models analyzed in the discussion part can be used as guidance for scholars to carry out further research in this field. With the further improvement and enrichment of real estate registration information in evolving markets such as China, more methods can be better tested and corrected. At the same time, with the development of artificial intelligence and geo-information systems, there will be more models that can be applied to the field of mass appraisal. Moreover, the trend of mixed models will continue. On the one hand, the emergence of new models will be applied to the field of mass appraisal. On the other hand, different methods can complement each other's advantages according to the characteristics of different data, thus forming a mixed model for mass appraisal.
There are some limitations of this review. First, the electronic database of Web of Science, although reflecting the trend of model application, may not contain all the articles related to mass appraisal. Further work could be done by searching for other databases such as EBSCOhost, IEEE Xplore, Scopus, Springer Link, Science Direct, JSTOR and ProQuest Central. Second, the models described and classified in artificial intelligence and geo-information systems do not mean the complete composition of relevant fields. In fact, the methods in the field of artificial intelligence and a geo-information system that are not mentioned in the literature may be applied in model selection of mass appraisal. Third, although the basic characteristics of models have been recognized, such as the elementary of MRA, the "black box" of ANN, the spatial heterogeneity and homogeneity of GIS model, the limitations of existing models have not been tested. Fourth, applicability and constraints to different countries are practical issues for future analysis and the application of mass appraisal models.
Finally, through analogy to the literature, it can be found that the process of creating, analyzing and testing a model is to simulate a complete mass appraisal process. The definition of mass appraisal mentioned in Section 2.1 is a standardized and generalized one. To summarize, we define mass appraisal based on model perspective as "mass appraisal 2.0": mass appraisal is the appraisal procedure of model establishment, analysis and test of group of properties as of a given date, combined with artificial intelligence, geo-information systems, and mixed methods, to better model the real estate data of non-spatial and spatial information.
Author Contributions: Conceptualization, D.W.; supervision, V.J.L. This paper is to be attributed in equal parts to D.W. and V.J.L.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflicts of interest.