Using Genetic Algorithms for Real Estate Appraisals

The main aim of this paper is the interpretation of the existing relationship between real estate rental prices and geographical location of housing units in a central urban area of Naples (Santa Lucia and Riviera of Chiaia neighborhoods). Genetic algorithms (GA) are used for this purpose. Also, to verify the reliability of genetic algorithms for real estate appraisals and, at the same time, to show the forecasting potentialities of these techniques in the analysis of housing markets, a multiple regression analysis (MRA) was applied comparing results of GA and MRA.


Introduction
The real estate sector requests accurate appraisals for property values, this is because various stakeholders pursue several objectives as keep under control real estate values or update their profitability returns [1].For these reasons, the interpretation of the housing market becomes a critical issue together to its evolution and dynamism, and at the same time appropriate techniques in able to adequately analyze the characteristics of real estate market are strongly needed [2].
Land and building are the two main components that are involved in the building production process.They are economic factors that are distinguishable even when the urban and building transformation process are completed.The scarcity of buildings in an urban area is then closely related to the available lands for constructions.This is a technical constraint that mainly depends by the geographical location of lands and by the urban planning policies, the latter are aspects that cannot be overcome by intrinsic factors of the real estate market.
Urban land is a limited factor, however it has an unlimited duration over the time.Due to these its features, the urban land is subject to a specific and intrinsic income.In fact, geographical location shapes the real estate markets and makes them sensitive to economic and social variations.Therefore, the nature of each urban site is unique, and this uniqueness strongly conditions the value and usefulness of the same urban land.
There is no doubt that more accurate is the analysis of real estate market if it is performed for each single submarket.To identify homogeneous real estate submarkets in an urban area, a key parameter is geographical location.

Targets and Research Design
Genetic algorithms (GA) have wide potential in the real estate field and present many advantages, especially from a practical point of view, with great benefits for real estate market actors, being able to highlight the variation of rental values for several sub areas belonging to the same market segment, even in small local real estate markets.
Moreover, the use of GA allows to predict, quantify, and locate where and how the rental values vary in a specific urban context, with a possibility to correlate these variations with any phenomenon or economic effect (i.e., modeling of locational variables, delimitation of areas with homogeneous values, etc.).
For this reason, GA were used in this work with the aim of identifying the effect on real estate rental values derived by geographical location of housing units.Again, in order to verify the reliability of GA for real estate appraisals and to show the forecasting potentialities of GA in the analysis of housing markets, a multiple regression analysis (MRA) was applied comparing, at the end of the work, results of GA and MRA applications.
MRA has been chosen for comparison purposes since it is a well-established methodology over time that provides robust and reliable results in real estate appraisals.
The analysis is carried out through the elaboration of rental prices of housing units located in a central urban area of Naples (Santa Lucia and Riviera of Chiaia neighborhoods).
Due to the opacity of Italian real estate markets, GA can aid to better interpret different segments of local real estate markets, or even help in the prediction and interpretation of the phenomena related to the genesis of rewards of position, with particular reference to problems of transformation and investments for urban areas affected from particular projects or plans, and in order to optimize the choices of use of goods and resources.

A Brief Literature Review
In the last several years, problems concerning real estate market analysis has grown up very fast, as this knowledge is very relevant for real estate predictions, investments, and taxation issues.
The most relevant difficulty in the forecast of real estate prices has always been the subjectivity of judgments to be processed for to obtain reliable real estate values [18].For this reason, multiple regression analysis (MRA) has been considered for long time as the most flexible technique able to provide reliable predictions and information on real estate values and market analysis.As possible alternative, mostly in economic or financial fields, artificial neural networks (ANNs) have been tested by researchers for forecasting purposes [19][20][21][22][23][24][25].
In literature, however, the comparison between MRA and ANNs often presents controversial aspects and does not clearly show the superiority of one or the other method [26][27][28][29].
As alternative to above approaches genetic algorithms have been experimented over the time.GA are stochastic search techniques that may explore wide and complex spaces based on Charles Darwin's evolutionary principle [30][31][32][33].Multi-parameter optimization problems, characterized by objective functions subject to constraints, are the main operating areas suitable for GA.Initialization, selection, crossover, and mutation are the four stages in which the search process is developed by GA [34].
The history of GA began with the first computer simulations on evolution, the latter began to spread in the 1950s.In the 1960s, Rechenberg [35] introduced the "evolutionary strategy", and the topic was further expanded by Schwefel [36].Instead, Fogel, Owen, and Walsh [37] developed "evolutionary programming", or a particular technique whose possible solutions were finite state machines.During those years, other experiences were carried on automatic learning and algorithms inspired to evolution and genetic [38].
However, only with Holland [31], and after with Goldberg [32], GA has become wide spread in many scientific fields, mainly thanks to the improvement of power of computational machines, making an ever wider practical application of those methods possible.
Holland's original idea, citing Mitchell [39], "was not to design algorithms to solve specific problems, but rather to formally study the phenomenon of adaptation as it occurs in nature and to develop ways in which the mechanisms of natural adaptation might be imported into computer systems".
Holland's GA, as presented in his book [31], is a technique useful to pass from an initial population of chromosomes to a new one, more fit to the environment, using a natural selection mechanism and the genetic operators of crossover, mutation, and inversion.
GA applications are very limited in real estate field.In international literature, Lertwachara [40] presents a GA method in identifying attractive stocks, for to measure the performance of profitable investment strategy; Ahn et al. [41] use ridge regression with GA to enhance real estate appraisal forecasting about Korean real estate market; Ma et al. [42] propose an approach given by the combination of hierarchical genetic algorithm and least squares method to optimize a particular neural network, with the aim to predict the real estate price index.In the Italian literature, only two works are findable on the use of GA in real estate appraisal: Manganelli et al. [43] test GA to interpret the relationship between real estate prices and property location, De Mare et al. [44] using GA verify the economic and territorial impacts caused by high speed railways.
Nowadays, the diffusion of GA is so large that a majority of big companies use them to solve a wide range of problems, like scheduling, data fitting, trend spotting and budgeting problems.

Characteristics of Genetic Algorithms
GA are adaptive techniques, particularly suitable in the resolution of optimization problems, able to seek-for a complex and prefixed fitness function-its maximum or minimum point.
In these circumstances, other traditional analytical techniques are not usable due to the impossibility of exploring the space of possible random solutions.
More generally, GA are adaptive heuristic search techniques based on the ideas of natural selection and genetic evolution.As already mentioned previously, at the basis of GA there are the concepts of simulation processes in natural systems necessary for evolution, the latter initially postulated by Charles Darwin with his theory on survival [39][40][41][42][43][44][45].
Then GA represent, in other terms, a clever utilization of a random search within a preset search space in order to solve a prearranged problem.
GA have been applied in a wide number of ways but their effective representation and selection of fitness function are the key parameters in their applications.
The relevance of GA in the applications is attributable to possibility of using them easily, even though they are, at the same time, reliable and adaptable algorithms able to search better solutions quickly in complex multidimensional problems.In particular, there are three specific conditions that, if verified, suggest the use in operational applications of GA: • the space of solutions is complex, wide or its knowledge is very poor; • the knowledge about a domain is poor or it is not possible to decode or restrict the space of solutions; • traditional tools or techniques of mathematical analysis are unavailable.
In analogy with a biological system, it is possible to assume that also the built environment is characterized by a given issue to be solved and it is constituted by subjects of the population that are solvers for the problem to be addressed.
On the individuals of the population, the environment exerts a pressure that is continuously in evolution, pressure identifiable as a specific fitness function able to provide for each person a high accuracy degree about solutions, proposed by the same people, for the problem under investigation.
Derived from biology field, crossover and mutation are main genetic operators considered in study cases.Whereby, referring to the concepts about natural selection, the best individuals of a given generation, if coupled together, will produce better subsequent and future generations of new individuals that can provide an entire population of possible solutions for the problem of interest.
In this way, there is a continuous improvement of genetic characteristics and in the number of individuals compared with previous generations.Thus, the better features are crossed and exchanged in the course of future generations.Exploring in detail the most suitable research areas, the optimal solution is identified by best individuals generated in the time.
Generally, five domains can be distinguished where GA are usefully applied, even if these domains do not have clearly defined boundaries [43]: Classification, modelling and machine learning.
For the latter domain, the use of GA is directed to the construction of models able to interpret an underlying phenomenon or for forecasting purposes [9].

Case Study
For the implementation of GA, the constitution of an initial population (random or derived by heuristic rules) is the first step; after, an evolutionary cycle has to be generated, and in correspondence of each iteration must be produced a new population using genetic operators to the previous population.
GA focuses on the identification of maximum or minimum point for a default fitness function.Therefore, the best solutions must be selected with subsequent iterations and, then, must recombine between them, so a continuous evolution towards the optimal state occurs.In particular, each individual generated from its previous population is assigned a specific value that depends on quality degree of solutions.Subsequently, more suitable individuals are recombined to produce new generations until there is convergence to the value attributable to the best individual.
The proposed model focuses on the interpretation of rental prices as the result of a multicriteria choice process.
The criteria of selection may be expressed by the characteristics of housing units, which are fundamental in the mechanism of formation of real estate values.For to determine the marginal price of geographic location, which contributes to define real estate rental prices, the original sample (initial population) has been defined with detection of real estate rental prices that refer to the recent past (last two months) and to housing units with similar building type and located in a central urban area of Naples (45 housing units located in Santa Lucia and Riviera of Chiaia neighborhoods).
About this particular issue (real estate sample size), Green [46] used statistical power analysis in order to compare the performance of several rules-of-thumb for how many subjects were required for MRA.Marks et al. [47] specifies a minimum of 200 subjects for any regression analysis.Tabachnick and Fidell [48] suggested that although 20 subjects per variable would be preferable, the minimum required subjects should be five.Harris [49] argued that the number of subjects should exceed the sum of 50 and the number of predictor variables.Schmidt [50] determined that the minimum number of subjects per variables lies in the range of 15 to 20.While Harrell [51] suggested that 10 subjects per variable was the minimum required sample size for linear regression models to ensure accurate prediction in subsequent subjects.Finally, Del Giudice [52] argued that the required sample size minimum is equal to 4-5 subjects per independent variable).
As reported above, also considering the opacity of Italian real estate markets as in the present case, the reduced real estate sample used in this work does not prejudice the operational usability of MRA or GA models.
Only non-homogeneous real estate characteristics, besides the geographic location, were detected for each sampled unit and in particular (see Other real estate characteristics (as panoramic views, noise, etc.), manifesting with the same modalities in all the sampled units were excluded from the analysis.
The geographical position (GEO) was expressed assigning a prefixed sequence of characters to each housing unit, so as to classify every housing unit as falling to a defined area among those already identified in a study focused on the city of Naples [4].The cited study used a geoadditive model based on penalized spline functions for the housing market segmentation and, specifically, by this work it is possible to subdivide the urban area considered in five subareas with homogeneous values as Figure 1 shows (A, B, C, D, E).More precisely, mentioned sub-areas were so defined (in terms of monthly rental price):
The geographical position (GEO) was expressed assigning a prefixed sequence of characters to each housing unit, so as to classify every housing unit as falling to a defined area among those already identified in a study focused on the city of Naples [4].The cited study used a geoadditive model based on penalized spline functions for the housing market segmentation and, specifically, by this work it is possible to subdivide the urban area considered in five subareas with homogeneous values as Figure 1 shows (A, B, C, D, E).More precisely, mentioned sub-areas were so defined (in terms of monthly rental price):
Obviously, residential units that falling in the mentioned five subzones (A, B, C, D, E) are included into a homogeneous urban area in terms of services and infrastructure qualification, in fact, the differences in value for the different sub-areas are small.Sub-areas with the highest values are characterized by historical and touristic amenities nearby, while sub-areas with the lowest values are almost exclusively residential.Obviously, residential units that falling in the mentioned five subzones (A, B, C, D, E) are included into a homogeneous urban area in terms of services and infrastructure qualification, in fact, the differences in value for the different sub-areas are small.Sub-areas with the highest values are characterized by historical and touristic amenities nearby, while sub-areas with the lowest values are almost exclusively residential.
Residual sum of squares (RSS) is the fitness function adopted for the proposed model to obtain residuals as the difference between sampled real estate rental prices and fitted values provided by the model [43]: where y i are the observed rental prices, y i are the fitted values, x i is the measure of i-th real estate characteristic selected, b 0 is the statistical error (or constant), and b i is the coefficient of i-th real estate characteristic (marginal price with which the i-th real estate characteristic contributes to the rent)."Genetic algorithms" tool was used in MATLAB software for to process the proposed model.
In MATLAB software, numerous iterations were performed, changing in several iterative cycles, one by one, constraints and model key parameters, directing the process to the minimization of the sum of squared residuals.Data processing has allowed determination of genetic code able to generate the optimal solution: • Solver: GA-Genetic Algorithm; Table 2 shows the values of b i coefficients.The vector of coefficients directly expresses the marginal rental prices for each single real estate characteristic.The rental price function is additive as follows (K j are the marginal contributions of real estate characteristics): The additive linear form with intercept was chosen because, in this work, the aim is to identify the marginal contribution on real estate rental values derived by geographical location of housing units.Otherwise, without the intercept, the analysis would provide average prices and not the marginal prices of real estate characteristics (which, as noted, are different between them).
Expected values are calculated with Equation ( 2) and, then, also the error between fitted values and observed rental prices.
Figure 2 and Tables 2 and 3 show model results.
Buildings 2017, 7, 31 7 of 12 Expected values are calculated with Equation ( 2) and, then, also the error between fitted values and observed rental prices.
Figure 2 and Tables 2 and 3 show model results.The optimal run has carried out 95 iterations (on a maximum of 5,000 prefixed iterations) and the best objective function value was equal to 1.956× 10 6 (see Figure 2).
For variable "AREA", the marginal price is €/sqm/month 13.21; for variable "MAIN", the marginal price is €/month 10.67 for each additive score point; for variable "FLOOR", the marginal price is €/month 0.90 for each additional floor level; for geographical location (GEO), the corresponding marginal prices are differentiate respect to five areas previously selected and, precisely: €/month 47.45 if housing unit is in area "A", €/month 23.53 if housing unit is in area "B", €/month 136.69 if housing unit is in area "C", €/month 34.47 (negative sign) if housing unit is in area  The optimal run has carried out 95 iterations (on a maximum of 5,000 prefixed iterations) and the best objective function value was equal to 1.956× 10 6 (see Figure 2).
For variable "AREA", the marginal price is €/sqm/month 13.21; for variable "MAIN", the marginal price is €/month 10.67 for each additive score point; for variable "FLOOR", the marginal price is €/month 0.90 for each additional floor level; for geographical location (GEO), the corresponding marginal prices are differentiate respect to five areas previously selected and, precisely: €/month 47.45 if housing unit is in area "A", €/month 23.53 if housing unit is in area "B", €/month 136.69 if housing unit is in area "C", €/month 34.47 (negative sign) if housing unit is in area "D", €/month 37.30 (negative sign) if housing unit is in area "E".
From these results, some considerations can be expressed: 1.
Variables "AREA", "MAIN" and "FLOOR" are all positive, showing an increase in the real estate rental prices in correspondence of increasing amount of the respective real estate characteristics; this circumstance is in line with the normal dynamics of housing market; 2.
Housing units that fall in zones A and B are those who, according to the study previously conducted on the segmentation of real estate market for the urban area considered (which, remember, was done with a geoadditive model), have lower average prices; this fact is indirectly confirmed from data processing carried out with GA (housing units in zones A and B have the lower marginal prices); 3.
Housing units that fall in zone C are those who should have intermediate prices between the five different areas considered, but this circumstance is not detected in GA implementation; on the contrary, housing units in zone C have the higher marginal price; 4.
Housing units that fall in zones D and E are those who should have higher prices between the five different areas considered, but instead the GA implementation shows that housing units in zones D and E have negative marginal prices; 5.
The floor level variable (FLOOR) has an impact almost irrelevant on real estate rental prices (marginal price equal to €/month 0.90 for each additional floor level); in fact, compared to the average real estate rental price, this variable is about 0.13% in its average amount.
From an exclusively numerical point of view, some of above considerations (see Points 3 and 4) may seem controversial, but can be explained considering that-even falling in a narrow geographic area-in the D and E zones there is the presence of a great castle (so-called "Castel dell'Ovo"), many luxury hotels and a pedestrian area.With the same environmental characteristics, the housing supply in areas D and E is much more scarce and, consequently, real estate rental prices tend to be higher, although housing units have less fruition ability for parking difficulties, vehicle traffic due to the great presence of tourists, and more distance from urban services mainly located in zone C. The latter conditions, typical of the surrounding environment, tend to justify and clarify because GA implementation provides negative marginal prices for D and E areas.It seems, therefore, that the GA has a greater ability to interpret current and future behavior of potential users of residential units, comparing GA with geoadditive models.
With the purpose of showing the forecasting potentialities of GA techniques in the analysis of housing markets, the results of GA implementation are also compared with those derivable from the application of MRA (with intercept) on the same real estate sample.From a general and theoretical point of view, GA and MRA have the same potentialities and application fields, and they are suitable to be used with the same assumptions and basic principles (known and established for MRA, then this facilitates the experimentation of GA); for these reasons, the comparison between the two techniques can be performed both in terms of residue analysis both comparing the marginal prices of real estate characteristics.
Firstly, comparison between GA and MRA highlights that marginal prices, for each single real estate characteristic, are different to each other.Output data of MRA show a determination index (R 2 ) equal to 0.953, a determination index corrected (R 2 C ) equal to 0.950, a standard error equal to 214.53, and a positive F-test (over 95% significance).
Secondly, although there is a substantial similarity between GA and MRA in terms of results, GA show their superiority in order to interpret the real estate rental market (10.62% for GA versus 11.50% for MRA in terms of absolute average percentage error, equal to a better prediction in favor of GA in measure of 7.65% about), as also it is detectable from the low value of statistical error (b 0 = −20.74,respect a intercept of MRA equal to −36.89).
Lastly, the hedonic function obtained with GA implementation, in order to process a direct appraisal of real estate rental prices in Santa Lucia and Riviera of Chiaia neighborhoods, may synthetically be defined as follows (see Table 2 for the amounts of coefficients): where b (A,B,C,D,E) represents the marginal price of geographical location (GEO) in correspondence of each sub-areas specified (from A to E).
Currently, only a few works use GA for real estate appraisal purposes [40][41][42][43][44].The current future research lines in real estate field about GA are the possible integrations with other techniques as artificial neural networks.In fact, ANNs and GA demonstrate powerful problem solving ability as they are based on quite simple principles, but take advantage of their mathematical nature: non-linear iteration.Integrations of GA and ANNs have a wide field of real-world applications (i.e., automotive design, engineering design, robotics, evolvable hardware, biomimetics invention [66]) but also suffer from several operational disadvantages (the choice of basic parameters as network topology, learning rate, initial population, etc. often already determine the success of data processing) and in the future it will be necessary try to work on these limitations.
Genetic algorithms were used in this case study with the aim to determine the marginal price with which the geographic location takes part in the formation of real estate rental prices.This problem has been addressed using a preliminary segmentation of a central urban area of Naples developed with a geoadditive model on the same real estate data.With respect to this segmentation, genetic algorithms show some difference, in that they are able to better interpret the current and future behavior of potential users of housing units.Even if the results obtained by genetic algorithms are excellent (absolute average percentage error equal to 10.62%), they do not improve, significantly, similar performance obtainable with traditional parametric approaches (better forecasting results for GA in measure of 7.65% about respect to MRA).
The improvement obtainable with genetic algorithms is mostly due to use of non-linear functions.In this sense, the computing capabilities of genetic algorithms and the modelling ability of other and more complex prediction or fitness functions utilizable for appraisal purposes (in order to better consider the non-linearity of typical real estate phenomena), might certainly provide even better results in the future.

Figure 1 .
Figure 1.Spatial distribution of real estate rental values [4] compared with aerial view of urban area derived from Google Maps.

Figure 1 .
Figure 1.Spatial distribution of real estate rental values [4] compared with aerial view of urban area derived from Google Maps.

Figure 2 .
Figure 2. Fitness function value and stopping criteria generate from MATLAB software.

Figure 2 .
Figure 2. Fitness function value and stopping criteria generate from MATLAB software.

Table 1 .
Statistical description of real estate data.

Table 1 .
Statistical description of real estate data.

Table 2 .
Results of GA application.

Table 3 .
Comparison between GA and MRA.

Table 3 .
Comparison between GA and MRA.