2. Comparable Evidence and Methods
2.1. Database, Pre-Processing, Methods and Performance Metrics
The studied database was obtained from the Department of Lands and Surveys (DLS). The data were used for the purposes of the Cyprus new general valuation [24
] and refers to transactions between 2008 and 2014, out of which only transactions for apartments in Nicosia District were studied. Although it does not contain important socioeconomic variables [25
], it is considered as vastly useful by professional valuers, as it contains comparable evidence about certain property types. Hence, the level of information available for the valuer could be greatly enhanced; however, the reliable exploitation of the contained information remains vague. A significant effort was spent in order to prepare the database in a predictors-output format. At this point, the authors highlight that the data would be significantly enhanced if remote sensing was integrated in order to enrich the database provided that was completed by on-site or drive-by observations.
In particular, 4261 observations of apartment/office sales in Nicosia existed, nevertheless from column Unit_desc, only values “APPARTMENT & 2-FLOOR APPRTMENT” were kept, resulting in 3786 remaining observations. Furthermore, only Municipalities that are regulated by the Nicosia Local Town Plan were selected, those Quarters with less than 20 observations were deleted and, finally, 3561 sales data were used for the analysis and predictions. In order to enhance the prediction accuracy of the models, Urban Planning data were added for each Planning Zone, and in particular, the maximum building density, the number of stories, height and coverage of the allowed building, the minimum sq.m. per resident and the expected sq.m. per resident. Due to multicollinearity among urban planning variables, only the maximum building density was finally kept. The transaction dates were converted to reflect the date 30 September 2018, as floating numbers constituting a continuous variable, and the prices were adjusted to 1 January 2013 utilizing the Central Bank of Cyprus Index. This index is using property data gathered from valuations submitted to the contracted banks since 2006. The relevant information is provided from independent property surveyors that evaluate properties mainly for mortgage purposes such as housing loans, mortgage refinancing and mortgage collateral.
The utilized variables were as follows, with their abbreviations in parentheses, for each Unit (Appartment)
Unit Enclosed extent, which is the Internal Area in m (IntArea).
The Unit covered extent, which is the Area of covered verandahs in m (CovVer).
The Unit uncovered extent, which is the Area of uncovered verandahs in m (UnCovVer).
Parcel extent, that is the Area of parcel (or plot) in m (ParcExt).
The Built Years, calculated as the difference among the date the transaction happened and the date the building was constructed, in years (BuiltYrs).
The Unit condition code (Cond), that denotes the condition of the building, and takes values from 1 (best condition) to 4 (worst condition).
The Unit’s view code (View), which denotes the view of the unit, with values from 1 (best view) to 4 (worst view).
The Unit’s class code (Class), denoting the class of the building. It takes Values from 1 (best class) to 4 (worst class).
Density (Dens), as the maximum allowed density (built m, over plots m) of the specific district.
The dependent variable was the apartment’s price as accepted by the Cyprus Department of Lands and Surveys. This price was adjusted by utilizing the Central Bank of Cyprus Index and the dates were transferred to 30 September 2018. The abbreviation for the dependent variable is (Adj. Accepted Price).
2.2. Error Metrics
Machine learning methods exhibit diverse performance on a studied dataset, with respect to the error metrics each time utilized. The Coefficient Of Dispersion (COD) was used (Equation (1
)) as defined by Appraisal Ratio Studies [26
], as a common metric utilized in Real Estate Appraisals. It is based on the Predicted Values (PV), the Dependent Variable (DV), and the number of observations N. COD is defined by
. Furthermore, the utilized error metrics were the Root Mean Squared Error
, the Mean Absolute Error
, the Mean Absolute Percentage Error
, the Maximum Absolute Percentage Error (MAXAPE), as well as the Pearson Correlation Coefficient
, the slope of the Predicted versus Actual values
, such that
, and the
2.3. Anomaly Detection
Although the observations in the studied database regard official registration in the DLS, some extremely unreasonable records occur. For example, property in Nicosia Municipality, Ag. Andreas Quarter, built in 1965, with 66 sq.m covered area, and a price of 3.524€, Latsia/Ag. Georgios (1977), 68 sq.m, with a price of 17.781€, Nicosia/Ag. Omologites (1982), 44 sq.m, 15.724€, Nicosia/Ag. Antonios (1973), 35 sq.m, 22.562€, and. Strovolos/Chryseleousa (1986), 76 sq.m, 17.283€. Accordingly, an iterative procedure was implemented in order to identify the outliers and eliminate at each step the observation which violates a specified threshold. The corresponding results were highly enhanced, as even for the Linear Regression (LR) (Figure 1
) the R squared was increased from 0.611 to 0.744, while the shape of the scattered observations is closer to a straight line after the removal of the outliers.
Algorithm 1 was selected in order to exclude observations with high prediction errors, as they represent apartments which were under- or over- priced by the DLS, for some particular reason. The algorithm was selected amongst others because it presented better results in terms of percentage errors that are more easily understood by property professionals.
|Algorithm 1:Anomaly Detection|
2.4. Machine Learning Methods
In order to evaluate more complex models, apart from Multiple Linear Regression (MLR), a Higher Order, Nonlinear Regression (NLR) was implemented. In particular, all combination of the variables were created, up to third order
for all the nine independent variables. Afterwards, a forward step-wise algorithm was implemented, in order to sequentially add to the model the combined variable with
, which corresponds to the model with the lowest
. Algorithm 2 represent the applied procedure.
|Algorithm 2:Step-wise, Higher Order Regression|
Furthermore, we utilized Random Forests (RF) [27
] as implemented in Reference [28
], and Gradient Boosting (GB) [29
]. All analyses were run on Juia [30
] programming language by utilizing the mentioned packages, as well as code written by the authors, as described in Algorithms 1 and 2.
Sensitivity analysis for features’ importance to the dependent variable (Adj. accepted Price), demonstrated similar patterns, for all the four methods used. However, Certain differences were also depicted, which highlights the need for such analyses on the trained machine learning models. The accurate modelling of a studied system is challenging, and its predictive value is controversial [12
], while the hopeful prospects that computers and refined models, will accomplish high prediction accuracy, were repeatedly defeated [1
]. The utilization of a more accurate model instead of empirical rules exhibited enhanced prediction accuracy in property valuations. However, mathematical models without error estimation could jeopardize valuations hence we recommend that one obtains an initial estimation +/− a prediction error, as well as comprehensively investigating the errors’ extrema and distributions. Machine learning algorithms can be used to validate professional valuations and not to replace human judgment, in order to avoid the impact of the highly improbable [35
The outermost important factors that the authors recommend to be examined are Time, Money, Quality, Accuracy, Bureaucracy, Responsibility, Regulations, Licenses, Initial cost, Neutrality and available data. Every single property valuation is a unique project and has a clear starting and ending date. Manual valuations are usually resourced intensive for both time and money and often deliver results in crucial revaluations later or sometimes never (Quevara [36
]). In a project, there is always a trade-off between Time, Money and Quality. Increasing one of the factors almost automatically decreases the remaining two. For example, a valuer who tries to complete more valuations within a given period, either must decrease the quality of each valuation to be faster per valuation or must hire more staff to deliver more valuations. AI does not have any of these constraints. It can work 24/7 and with the correct data, can produce a theoretically infinite amount of valuations. Practically, the amount is limited to the available data as well as the input of this data by a human source.
In the above paragraph, data has been mentioned as an important component. CAMA and AVM can only exhibit high computational efficiency if the database contains adequate data. Theoretically, one could state that if no data is available, AI could not be used. On the other hand, without precise data, any human-based valuation would not be very precise either. It takes years of studying and obtaining practical experience as well as local market knowledge for a valuer to be able to deliver accurate valuations and appraisals. This process of learning is time-consuming and rather expensive. AI can do so within a short period of time and can improve its performance based on past observations. Due to that, human valuers are expensive. AI can offer a much less expensive rate for any valuation since cost such as travel time and travel expenses to the property can be saved. However, AI has a higher initial cost as it is expensive to set up a model. The maintenance of the database and feeding the AI model with more data are usually the highest running expenses. Any invention that may replace workers with machines in a particular field can have a positive effect on society by “reducing the price of goods, increasing real income” [37
]. Research conducted in this context suggests that the methods, currently used extensively, have inherent errors regarding how they derive their value estimates [38
]. Many scientists stated that feelings and sympathy are what make us humans. These are unarguably great assets of every human; however, in valuations, they can create inaccuracies due to the loss of neutrality. Humans can only control their doings up to a certain level. AI does not lose neutrality and hence accuracy, due to sympathy, therefore, in this aspect it can create more accurate valuations.
Carrying out an official valuation requires, in almost every country, a license. These licenses are often provided by human-based associations. Often political reasons block any technological process as some humans fear losing their job to AI. This political lobbying reduces progress considerably and by doing so the human valuer is heavily favoured. Human valuers often argue about the responsibility and legal pursuit of AI. A valuation carried out by a human valuer can always be challenged and one can sue the person who completed the valuation but the questions to be answered are—who do you sue when a CAMA valuation is in question, and who signs a CAMA valuation. The above two questions can unfortunately not be answered easily. Looking for the responsible party of a CAMA valuation is a tricky process, which is one of the major drawbacks of AI. However, if we feed the AI model with enough data and constantly maintain and update the database, the possible margin of error shall be small enough to be negligible, and costly legal processes could be avoided or minimized. Besides that, we must understand in which situations we value properties and if all valuations need to be legally appropriate in terms of responsibility and suitability. Nowadays, countless valuations are done daily; mostly valuations for courts or banks giving out mortgages or attempting to repossess distressed/mortgaged assets, but there are so many more valuations conducted for many other reasons.
All the explanations described in the above paragraph could be ideal situations for the use of AI, in order to provide cheaper and faster valuations. Having this kind of valuation completed by AI models would, of course, reduce the total number of valuations completed by human valuers. However, it has to be stated that the effect of artificial intelligence on the level of human employment will be dramatic reduced [39
]. This, however, does not necessarily mean that any human valuer should lose their job. It could mean the opposite. Human valuers could focus more on each valuation, automatically increasing the quality of every valuation completed by a human valuer. Special reference must be made to complex valuations where a valuer needs a lot of time to fully understand and adjust the influencing factors. By giving human valuers more time to focus on these complex valuations and valuations for bank lending or repossessing purposes, increases the quality significantly. The improvement of quality will automatically lead to a higher achieved price per valuation which could, in the end, create higher profits for any valuer.
Remote Sensing Integration in Mass Appraisals
Remote sensing is another important tool that can be used in Mass Appraisals and data collection. In remote sensing, information about a given category of property is acquired without necessarily visiting the property [40
]. According to Nayak and Zlatanova [41
], remote sensing experts establish GIS systems that are often utilized. Remote sensing makes it possible to determine the attributes of a property such as its location, lot size, and type of structures that have been erected on the land. This is especially helpful because some property may be located in areas where access is restricted, as mentioned by Xiao-sheng, Zhe and Ting-li [42
]. Remote sensing makes the identification of property easier because in the remote sensing developed maps, property lines can be drawn that show the exact location of the property [43
]. Remote sensing can also be used to provide measures for a number of dependent variables, which are linked to human activity, especially with regards to the environmental impacts of various social, economic, as well as, demographic processes. For instance, remote sensing observations of land cover may depict the footprints of agricultural intensification, the expansion of urban areas, as well as road development and many other factors that are affecting the value of properties. These may also entail observations of vegetation density that may be linked to the impacts of fertilization, irrigation, coupled with other agricultural practices. Other areas may cover observations of new buildings constructions that are related to mass appraisals. Therefore, models that combine remote observations with ground-based social data may be very important in understanding their market value.
Machine learning models are highly non-transparent and it is difficult to completely understand what affects the value of a particular property the most. We defeat this issue by detailed sensitivity analysis for each predictor, by utilizing and comparing four machine learning models. Further studies in this sector need to be carried out in order to improve the overall transparency of any model used. However, Machine learning models are characterized by a consistent error across all the given observations, which follows a known statistical distribution, while valuations completed by human valuers might contain different types and magnitude of biases. The models would be even more precise if the database was enriched with more data that are related to the characteristics of the property. The easiest and cheapest way to get these data today is through satellite imagery. Data such as elevation, building height, age, construction type and distance from value influence centers such as schools, hospitals, public transportation and so forth, or even pollution or air quality in the area under study can be collected from satellites. Lastly, with machine learning techniques, important constraints have been identified such as the transparency of models and the repeatability of the results [14
]. Especially in Cyprus, larger-scale tests on still needed to be completed repeatedly. Finally, machines have already taken over a lot of jobs that were previously carried out by humans and every time we got to a point where the chance that humans could lose jobs, more jobs were created thereby increasing prosperity and the quality of life for humans. Machines assist us and improve our lives. Coming back to the starting quote, machines, and especially AI as described above, are capable of increasing our quality of intelligence as humans.